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PREFACE 



This volume includes a selection of tutorial and technical papers presented at the 3rd 
GAMM/BFIP-Workshop on "Stochastic Optimization: Numerical Methods and 
Technical Applications", held at the Federal Armed Forces University Munich, 
Neubiberg/Munich, June 17-20, 1996. 

Optimization problems arising in practice contain usually several random parameters. 
Hence, in order to get robust solutions with respect to random parameter variations, i.e., 
to reduce expensive online measurements and corrections, the mostly available statisti- 
cal informations (samples, moments, etc.) about the random parameters should be 
considered already at the planning phase. Thus, the original problem with random 
coefficients must be replaced by an appropriate deterministic substitute problem, and 
eflScient numerical solution/approximation techniques have to be developed for solving 
the resulting substitute problems. 

E.g., evaluating the violation of the random constraints by means of penalty functions, 
or applying a reliability-based approach, one obtains a stochastic program with 
recourse, a chance-constrained stochastic program, respectively. 

Solving the chosen deterministic substitute problem, one has then to deal with the 
numerical evaluation of probability and mean value functions (represented by certain 
multiple integrals) and its derivatives. 

Therefore, also the aim of the 3rd GAMM/IFIP- Workshop on "Stochastic Optimization" 
was to bring together scientists from Stochastic Programming, Numerical Optimization 
and from Reliability-based Engineering Optimization, as e.g. Optimal Structural 
Design, Optimal Trajectory Planning for Robots, Optimal Power Dispatch, etc. 

The following Scientific Program Committee was formed: 

H.A. Eschenauer (Germany) 

P. Kail (Switzerland) 

K. Marti (Germany, Chairman) 

J. Mayer (Switzerland) 




VI 



F. Pfeiffer (Germany) 

R. Rackwitz (Germany) 

G. I. Schueller (Austria). 

The first day of the Workshop was devoted mainly to four one-hour tutorial papers on 
one of the main topics of the Workshop: Modelling aspects, approximation and 
numerical solution techniques, technical applications of stochastic optimization. The 
tutorials are contained in part I. TUTORIAL PAPERS; the technical contributions 
providing new theoretical results, numerical solution procedures and new applications 
to reliability-based optimization of technical structures/systems are divided into the 
following three parts: II. THEORETICAL MODELS AND CONCEPTUAL METH- 
ODS, m. NUMERICAL METHODS AND COMPUTER SUPPORT and IV. TECH- 
NICAL APPLICATIONS. 

In order to guarantee again a high scientific level of the Proceedings volume of the third 
workshop on this topic, all papers were refereed. We express our gratitude to all 
referees, and we thank all authors for delivering the final version of their papers in due 
time. 

We gratefully acknowledge the support of the Workshop by GAMM (Society for 
Applied Mathematics and Mechanics), IFIP (International Federation of Information 
Processing), the Federal Armed Forces University Munich, the Friends of the Uni- 
versity, and we thank the commander of the student division for the kind accom- 
modational support. 

Finally we thank Springer- Verlag for including the Proceedings in the Springer Lecture 
Notes Series. 



Mttnchen/Ztlrich 
October 1997 



K. Marti, P. Kali 
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Bounds for and Approximations to 
Stochastic Linear Programs 
with Recourse^ 

— Tutorial — 



P. Kali 

lOR, University of Zurich, Moussonstr. 15, CH-8044 Zurich 



Abstract. The objective of stochastic linear programs with recourse con- 
tains a multivariate integral Q{x) = Q(x,^)P(d^)j in general. This, al- 

though having convenient properties under mild assumptions (like e.g. con- 
vexity, smoothness), causes difficulties in computational solution procedures. 
Therefore we usually replace Q(-) by successively improved lower and upper 
bounding functions more amenable to optimization procedures, the involved 
bounding functions being solutions to various (generalized) moment prob- 
lems. 

Keywords. 90C15 (1991 MSC) 

1 Stochastic Programs with Recourse 

Consider a stochastic program of the type 
(1.1) min{c^x -h Q{x)} 

x^X 

under the assumptions: 



— X C convex polyhedral, 

— Q(x) := J Q{x^^)P{d^) with E C IR^ a convex polyhedron, P a 
probability measure on S, 

— I Wy = h{^)-T{^)x, y >0} with /i(-),T(*) linear 

K K 

affine in ^ € S, i.e. h{^) := hP + T(^) := -f where 

i=l i=l 

Revised version of a paper presented at IFIP WG 7.7 Workshop — and Tutorial — on 
Stochastic Optimization; Tucson, AZ, Jan 15-19, 1996. 
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e and T* are m x n-matrices, i = 0, • • • , if . 

Then we have 

Proposition 1.1 Q{x, •) : S — > IR is convex Vx G X. 
and 

Proposition 1.2 Provided the existence of the integral, then Q(-) : X — > JR 
is convex. 

In spite of the last fact, we may not solve (1.1) by just applying any iterative 
method for convex programming since the repeated evaluation of Q{x) (and 
possibly of its gradient) would involve repeated multivariate integration which 
in general cannot be efficiently done. Hence we usually try to approximate 
Q{x) by lower and upper bounding functions which are easier to deal with. 

2 Bounds for Univariate Integrals 

Consider 

( 2 . 1 ) 



With S := [oo, oi] C IR, IR convex, 

we have 

Proposition 2.1 (Jensen inequality [15]) For^ := holds 
(2-2) 

reducing to an equality if (f is linear on S. 

Sketch of the proof: For any discrete distribution P, 

r 

^ '^) * Pi ^ Q? ^ \ Pi ~ 

i=l 

it follows from convexity that 

'Pi) < ^^{^i) Pi= [ 
i=l i=l 
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Defining a sequence of simple functions on E such that the sequences 
and are mean fundamental (w.r.t. P), that and that 

Halmos [13]), implies the hypothesis. □ 

On the other hand, ionp :E — >► IR a convex function on H := [ao, ai] C IR, 
we have 

Proposition 2.2 (Edmundson-Madansky inequality [9, 26]) With J := E^, 
for the 

(E-M) distribution 



(2.3) I (ao,Pao = — — , (ai,Pai = ~ — | 

(V ai - ao / \ ai - ao / J 

holds the (E-M) inequality 



(2.4) f ip{i)P{dO < ¥>(ao) • Pao + <^’(«l) • Pai • 

Proof: For ^ G [ao,ai] holds 






0^1 

oi — ao 



• 0^0 + 



^ - 0^0 
Oil - Oio 



• ai 



and hence, due to convexity. 



<P(0 < <^(«o) • 



Qi 

ai - ao 



+ p{ai) ■ 



j-OiO 
Oil - ao ’ 



Integrating this inequality yields 



/ viOPm < ^{ao) ■ + yp(ai) • ^ 

Je ~ ^0 Oil - Oio 

and hence the hypothesis. □ 



3 Bounds for Multivariate Integrals 

Let S := [ctiO) Ctil] Cl IR^, p : E — > IR convex. Again we have 
Proposition 3.1 (Jensen inequality [15]) For ^ holds 

( 3 . 1 ) ^io<f<p{OPm, 

reducing to an equality if (p is linear affine on E. 
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Proof: In analogy to Prop. 2.1 □ 

Consider now for ^ E H the one-point- distribution : P^{^) = 1. The 
corresponding E-M distributions per component, P^P,i = 1, • • • , iiT, are 



p(aio;Ci):= PvPim = aio) = 



OLil — aiQ 



p}jp {r}i = otii) = (-1) 



^ Qto - ii 
an - aio 



Denoting the vertices of 5 by a*", with u = (i^i, • • • .i'kP and G {0, 1}, 
such that , and assuming the components r]i of rj to be stochastically 

independent, we get the joint distribution as 

(•x C\ — — rii=l(~l) 



(3.2) p{a^;0'-=PpHan = ^ 

where Vi = 1 — Ui. 

Observing that and hence 



rit=l(^il ^io) 



Ep^r^ = Y^ayp{a''-,0=i, 



we have from Jensen’s inequality 
Proposition 3.2 For ^ G S holds 



¥>(0 < [ <fiv)PpHdv) = E 

= '^¥>{a'')p{a’';0- 



If is linear on S, then 



ifiO ='£^{a‘')PpHan 

V 

= Y^<p{a‘')p{a‘';0- 



Prom Prop. 3.2 follows immediately 

Proposition 3.3 (Edmundson-Madansky inequality) For 



^ I^Ppy{a‘')Pid^) = j^pia‘'-,OP{dO 



(3.6) 
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holds the E-M inequality 

(3.7) lyiOPidO < 

JE ^ 

If is linear on S, then 

(3.8) ^ viOPid^) = E </’K)^“K)- 



Observe that from (3.3) and (3.6) follows 



(3.9) 



U ^ V 

< 

= jjp{dO = l 



To get the E-M distribution P^{a^) explicitly we have to distinguish whether 
the components {^i, • * • of ^ are stochastically independent or not. 

For the independent case we have due to Kall-Stoyan [21] 



Proposition 3.4 If the components of ^ are independent, then the E-M dis- 
tribution is 

Q 



(3.10) 



P^(a^) = 






rii=l(o^il ^io) 



Proof: Due to (3.2) and Prop. 3.3 we have according to the assumed inde- 
pendence 



P^{a^) 



l^p{a‘';0Pm 

Js Y[f=i{Oiil - Otio) 



nL(-ir(«iF.-Ci) 

riill(«il - «io) 



□ 



For the dependent case Frauendorfer [10] has derived 
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Proposition 3.5 If the components of ^ are dependent, then the E-M dis- 
tribution is 



(3.11) 



where 



p^(oy) — rii=i( U) 

r[fci(Q^ii -«to) 

rifcl(Q:il -«to) 

B the set of all subsets o/ {1, • • • , K}, 

<5a(*^) := n«eA(-l )"‘ , AeB, and A :={!,■ ■■, K}\A, 

:= /(IleOm), A€fi, 

ieA 

PA :=mA-IlieA^i> AeS. 



Proof: see Frauendorfer [10] 



□ 



Remark 3.1 For the components ^i, • • • , being independent we have p\ = 
0 VA G 5. Hence, in this case (3.11) coincides with (3.10). 

Defining the class V of probability measures on E as the set of distributions 
having the joint moments B, i.e. 

'P--={P\ = "iA VA e B], 

ieA 

and introducing for probability measures the partial ordering by 

P Q j ’4>{C)P{d(,) < J tpiOQ{dO ^ convex V’ : H — ¥ IR 

(Stoyan [31]), we see that P° 6 sup^^^V, i.e. P° solves the moment problem 

max(‘=){P I P e P}. 

Moreover it was shown (Kail [18]), that is a singleton, i.e. that 

max^‘^^{P I P 6 P} is uniquely determined by (3.11). 



□ 



4 Bounds on Simplices 

If instead of H := [ttio, ctii] we have the simplex 
A = conv [do,di, - • ■ ,dK} (with do,di, - ■ ■ ,d,{ being affine independent) con- 
taining supp P, then for any ^ 6 A the system of linear equations 

poiO + pi(0 + + Pk(0 = 1 

doPoiO + c^iPi(6 + ••• + dKPKiO = 
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or briefly the system 



DpiO = 



D = 



1 1 ••• 1 

do d\ • • • dK 



has the unique solution 



pio = z)-' n ) > 0- 



Hence for ; A — V IR being convex we have 



viO < '^PiiOV’idi), 



yielding the following version of the E-M inequality: 

Proposition 4.1 For A D suppP and cp : A — > IR being convex it holds 



[ ‘fiOPidO <J2v>(di)Pf 

Ja 



where the E-M distribution on the vertices of A is given by 






= D- 



Therefore, to determine the E-M distribution requires just to solve the linear 
equations 

DP^ = f i ^ 



involving ^ but not the higher order mixed moments (see Prauendorfer [11]). 
However, if A D S = suppP, we have to expect for the E-M bounds that 
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Example 4.1 Assume that 

“ := [-1.1] X [-1.1] ^ “ conv{(3,-l)'^,(0,2)'^,(-3,-l)'^}. 

Assume further, that P is the uniform distribution on E, and that (p(^) = 

Then we have 



D = 




and D ^ - 





1 

3 


1 

6 


1 

6 


\ 




1 

3 


0 


1 

3 




1 


1 

3 


1 

6 


1 

6 


/ 



yielding 



= 



111 
3’ 3’3 
1 



On the other hand obviously P°(a*') = - Vi/. Hence with 

= 2 Vi/, ifi{do) = <^(^ 2 ) - 10, ff{di) = 4 
we get Tp^ = 8 whereas = 2. 



5 Generalized Moment Problems 

Observe that for a X-dimensional interval E as well as for a simplex A the 
E-M distribution does not depend on the particular convex function (/?. How- 
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ever 

- we need the first moments to compute on A; 

- we need all mixed moments m\ to find on H; 

- for supp P C 5 C A holds for the E-M upper bounds ^ for any 
convex ip. 



To avoid the computation of the 2^ mixed moments, we may find another 
upper bound as follows: _ 

For a probability measure P with Ep^ = ^ and suppP being contained in 
any convex polyhedron 



B := conv {uo, ui, • • • , C IR^ {vi the vertices), 
such that ^ G intP, consider the class of probability measures 

Vb:= {P\P{B) = h Ep^ = ^}. 

Due to well known duality statements for semi-infinte LP’s and their dual 
(generalized) moment problems (Karlin-Studden [23] and Krein-Nudel’man 
[25]) we have that for any convex continuous function ip : B — > JR 

K K 

max Epip{^) = + ^0 I E e 5}. 

1=1 2=1 

Defining t := (to, ^i , * • * ? we get due to the convexity of ip (see Dupacova 
[4, 5, 6]) 

Proposition 5.1 ^4n upper bound Tp* maxp^'pPp(/? and the correspond- 
ing extremal distribution P* = {po, ' ' * ,Pr} vertices of B) result from 

solving 

“X 

(•c I'l I J = 0, •-,»'} = 

= max{Y^''j^oPj(pivj) I Ej=oPi^j = Ej=oPj = Pj ^ 0 '^j}- 

^ and P* depend on ^ and on ip. 

If in particular B is an interval E = conv {uq, • • • ,^ 2 *^- 1 }? fhen the E-M 
distribution P^ is feasible in (5.1) and hence Epip{^) < ^ < But to 
solve the LP (5.1) may be cheaper than to compute all mixed moments. 

Example 5.1 As in Example 4-1 'we consider 
S := [— 1, 1] X [— 1, 1] with the uniform distribution. 

Numbering the vertices counter-clockwise, starting with = (1,-1), we 
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know that Pq = ” • = Ps = \- 
Choosing (p(^) := + ^ 1^2 + 



(p(v^) = 



1 for j = 0 

3 for j = 1 

1 for j = 2 

3 for j = 3 



and hence = 2, whereas with p* = p* = Q follows ^ = 3. □ 

The particular case, to construct an upper bound ^ for Ep(p{^) by solving 
the moment problem maxp^Pg Epcp under the moment conditions Vb := 
{P I P(B) = 1, Ep^ = to get P*, can be generalized in principle. This 
involves the duality statements for semi-infinite LP’s mentioned above. 

For probability measures P with supp P C 0 C JR^ let 

a : 0 — > IR^, (f : 0 — > IR 

be measurable and D := conva(0), m G IR^. Then for the pair of dual 
problems 

( Vim := infyo,y{2/o + m^y \ Vo + a(0’^2/ > ‘fiO ^ 0}, 

(5.2) I 

i t;duai := supp {/e ip(0P{d^) I /e = m, /q P(dO = 1} 

we have due to Kemperman [24] 

Theorem 5.1 If m e intD then t^prim = ^duai/ o>nd for Uduai < oo, t;prim is 
attained for some (2/o?2/^)^- The conditions 

i) ^duai is attained; 

ii) 3 a primal feasible {yo,y^)^ such that 3^^ E 0, i = 1, - > 1, for 

which yo -h a{^^)^y = (p{C) Vi and m E conv • • • , 

are equivalent. 

Prom Richter [27] and Rogosinski [30] (see also Kemperman [24]) we know 

Theorem 5.2 For /i,-*-,/at being integrable functions on the probability 
space P), there exists a probability measure P with finite support in Ct 

such that 

[ fiii^)P{du) = f fi{u)P{duj), i = 

J Cl I Cl 

with card (supp P) < N + 1. 
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Remark 5.1 Due to Theor. 5.2 we may restrict ourselves to discrete distri- 
butions in Q. If we can satisfy condition ii) of Theor. 5.1, we have solved 
both problems in (5.2) at once since 

U V 

m G conv{o(^i),---,a(^^)} ^Pi >0 : m ~'^a{^i)pi, - 1, 

2=1 2=1 

(Cir * • ) Cl/) Pij ' • * iPu) is feasible for the dual in (5.2), implying 

V V 

^'duai > ^v{^i)Pi = ^{yo + a{^i)'^y)pi = yo + m^y > Vprim, 

2=1 2=1 

whereas by weak duality t^duai ^ ^prim- ^ 

As an instance for this approach consider a distribution P with 

P '= f ^P{di) and p := [ l|^|P-P(dO, 

II • II the Euclidean norm. Then choose for (5.2) 

a(0:=( ||^^|2 ) and ^ ^ , 

these moment conditions having been considered first by Dula [3] for simpli- 
cial functions ip. The assumption of Theor. 5.1 may be checked (see Kali [20]) 
by 

Lemma 5.1 With a,rn as above and D := conva(lR^'*"^), it holds 
meintD p > 11/xjp. 

Hence Theor. 5.1 applies iff the distribution P is not completely degenerate, 
i.e. iff var (^i) > 0. 

Assuming that p is nonlinear and is determined by 
(5.3) ifiO = (dj^ - fj), 

as it holds for the recourse function Q{x,^) := minj,{5^y | Wy = h{^) — 
2/ > 0} for any fixed x, the primal problem in (5.2) can be shown to 
be the convec^ solvable program 

{ Vim = inf {3/0 + P^y + pyK+i) 

s.t. iyoyK+i + ^fjyK+1 - \\dj - 2/IP >0, j = l, • • • , r, 

2/k+i > 0. 

For iyo,y,yK+i) solving (5.4), with J C {l,---,r} denoting the active 
constraints, follows (see Kail [20]) 

^i.e. convex objective and convex feasible set 
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Proposition 5.2 For determined as 

cu) .= j = i ... , 



holds 






U 

pj Uii^^^')|p 



G conv 






Hence, by Theor. 5.1, with pj solving 



EjejPi = 1 

= P 
= P 

Pj > 0, j G J 



we get the upper bound 



> / ^{i)P{do. 

jej 



Example 5.2 Let P be the uniform distribution on 
such that 



p = 




and p = 



1 



With 



(fiO = max{^i,^2,-6,-6} 



and the notation of (5.3) we have /^ = 0 Vz and 



-1,1] X [-1,1] 



di 



1 

0 



d2 



0 

1 



Now problem (5.4) reads as 



d3 - ( 0^ ) , di=(^^^y 



'^prim 

S.t. 



= inf{yo + I2/3} 

42/02/3 > ( 1 - 2 / 1 )^+ 2/2 

42/02/3 > 2/1 + (1 - 2 / 2 )^ 

42/02/3 > (1 + 2 / 1 )^+ 2/2 

42/0^3 > y? + (1 + V2)^ 

ys >0 



yo - yi-y2= 0, m = 2 '^- 



having the solution 
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From Prop. 5.2 we get the ‘tangent points’ 




and then we have to solve the linear system 



n. 



0 

-1 



EjsjPj = 

Pj > 



1 




1 

3 

0, j e J, 



yielding Pj = |, j = 

Hence we get the upper bound 



4 1 

Jl<p{i^’^)Pj = ^ = 0.57735, 



whereas the E-M bound would yield (with Pi = \ ^i, C l^he vertices ofE) 



Y,‘pd')pi = 1- 



For more on bounds and related moment problems see e.g. Edirisinghe and 
Ziemba [7, 8], Gassmann and Ziemba [12], Huang et al. [14], and Kail [19, 20]. 

6 Improving Bounds 

Consider again the interval H := XiLi [o^iOi C containing suppP, 
and a convex function (p : E — >► IR. Using the Jensen (3.1) and the E-M (3.7) 
inequalities we know that 

l<piOP{dO 

Je ^ 

where linearity of on H implies equality on both sides. Dividing [aio,Q^ii] 
at some (5 € (o^io,Q;ii) into the to disjoint parts [aio,/3) and [/3,aii], we 
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get the two disjoint intervals := [o:io,/5) X X ^2 and := 

[/^) <^ll] X Xi=:2 Q!il] with U = H, i.e. we have partitioned H into 
two intervals. 

Using the conditional distributions := P{- \ H‘) and conditional ex- 
pectations := E{^ I E*), i = 1,2, we get from the Jensen and E-M 
inequalities 

>p{OP^'Hd^) < * = 1 , 2 , 

^ U 

where are the vertices of E* and are the E-M probabilities 

derived from P{- | E’). With pi P(E‘) we have | = pi^^^ + p 2 ^^^ and 
hence due to Jensen 

vil) <PMt^)+P2v{t^) 

< Pi <piOP^^HdO+P2j^^ ifiOP^^HdO. 

Further it is obvious that 

^ viOP{d^) = Pi ^{OP^^HdO +P2 ¥>iOP^^\dO 

< Pi -f P2 5]¥^(a(2)-)p(2)0(a(2)‘'). 

V 

And finally, due to (3.9) we have X)., and hence, since 

? = Pi?^^^ +P 2 ^^^\ from E-M follows 

Pi 'Lu V?(o(l)‘')P(l)0(o(l)'^) + P2 E. (p(a(2)-)p(2)0(o(2).) 

< E.V’K)P‘’(a‘'). 

Putting these together we have 

Proposition 6.1 Partitioning the interval E into andE? , the lower bound 
(Jensen) is increased and the upper bound (E-M) is decreased to 

Pifit^^) +P2‘p{t'^^) < f(piOP{di) 

(6.1) I 

< Pi ^(o(i)'^)p(i)0(a(i)‘') + p2 53 ¥^(a(2)‘^)p(2)0(a(2).) 

k. U If 

Example 6.1 For :i := [—1,1] x [—1,1] with P the uniform distribution 
consider (p(^) := max[{3^i ~ ^ 2)5 {~2^i -f ^ 2 }]- 

Then ^ = (0; 0)^ and the E-M probabilities for 
ai:=(l;-l)T := (1; 1)T 
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are P^{a^) = | Vi/ with (p{a^) = 4, (f{a^) = 2, (f{a^) = 3 and (p{a^) = 1. 
Hence we have 

_ r ^ K 

= 0 < / ^iOP{d^) < X^</^(a‘')P°(a-') = -. 

U=1 

Partitioning E at = 0 into and E^ with pa = P{E^) = ps = 

P{E^) = we get 




Figure 2: Vertical partition 



= (-i,0)T, p(^)0(a(^)'^) = p(-B)0(aW‘^) = \ Vi/, and 
tuit/i the above values <p{a'') and yj(0, 1) = ip{0, — 1) = 1 follows 

3 MB), 

) = 2’ ^ ^ 

and according to (6.1) 

< l^^iOPidO 
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P(S^^) we get 

(-1.1) (1.1) 



(-1.-1) (1.-1) 

Figure 3: Horizontal partition 

= (0,|)T, = (0,-i)T, p(/)0(a(/)‘^) = p(^/)0(a(//)‘') = 1 and 

with (p{a^) as above and (^(1,0) = 3, (^(—1,0) = 2 
follows 

Vlf') = i, = i 

and according to (6.1) 

< f^iOPidO 

u 

□ 

If V(p{a^) = V(p{a^) Vi/ ^ /i, this implies linearity of (^ on S (by convexity). 
Due to Props. 3.1 and 3.3, then 

= I^V>iOP{d^) = 53^(a‘')P°(a"), 




and hence we are done. 
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7 Convergence 

We Started with <So := {So} := (S) and constructed a partition Si := 
We may continue by paritioning the same way E\ and/or Sf get- 
ting a refined partition S 2 (i.e. for any element S 2 6 «S 2 exists an element 
E[ e Si such that S 2 C S^). Thus we can construct a sequence of successive 
refinements {Sk}- With := sup{||^ ~ ^111 ^ we define the 

width of the partition Sk as S{Sk) := max{5(S;^) | G Sk}- Hence {^(5^;)} 
is a monotonically decreasing sequence. 

Theorem 7.1 Assume that 



- suppPcS, 

- that (f : E — > IR be continuous, convex and 

- that {So = {S}) be a sequence of successively refined partitions 

such that S{Sk) — > 0. 

Then for the corresponding sequences of Jensen distributions and 

of E-M distributions holds 



(7.1) 



< ^ 



j^¥>iOP{dO 

I^V’iOPidO- 



Proof: Obviously ip is uniformly continuous on E. Hence, 

Ve 35, : |(^(0 - ip{r,)\ < e € E : U - t,|| < 5,. 
Since 6{Sk) — > 0, there 



3N{6,) : 6{Sk) < VA: > N{Se) 
such that for any k > N{Se) and arbitrary G Sk holds 

sup Iv5(0 - ¥’(i?)l < £• 



Observing that 



P{dO = Vj 



and defining the simple functions T.k^^k • ^ ^ ^ by 

‘EkiO ■= inf.esi ¥>(»?) 1 

‘PkiO ■■= sup^gsi <^(»?) J 
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for which 0 < (p^iO ~ < e € H and Vfc > N{Se), we get with 

£(*) _ I |^(*) attaining (only) vertices within the partition Sk- 

< l^^kiOP^'‘^°{d^) = l^vkiopm- 



Hence 



l^piOPid^) 



and ^ MOP^''^°{d^) ~ v{OP{d€) 



are bounded by 



l^^kiopido - l^<£^{OPm = 

= Ij^kio - Tik(^)\pm 

< e 



□ 

Theor. 7.1 states, in the terminology of Billingsley [ 1 ], the weak convergence 
of and {p(^) 0 } to P. For the recourse function Q(a:,^) as defined in 

Sec. 1 this implies the epi- convergence of (r) := and 

Qk{x) := to Q{x) = J Q{x,^)P{d^), which in turn 

ensures that 

minj:gx{c'^a; + £j,(x)} — y mina;gx{c'’'’a; + Q{x)}, 
minxex{c'^x + Qk{x)} — > min^gA- {c'^a: + Q{x)}, 

and for arbitrary accumulation points x and x of 
I x(*) € argmina;gA[c'^a:+ Qj.(a;)]} and 
{a;(*) I ^(*) g argmina,gx[c^a: + Q/t(x)]}, respectively, hold 
X € argmin*gA[c'^x + Q(x)]} and x € argmin^igAlc'^a: + Q(a:)]}. 

(See e.g. Birge-Wets [ 2 ], Kail [ 16 , 17 ], Robinson [ 28 ], Robinson-Wets [ 29 ] and 
Wets [ 32 ]). 
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Abstract: A power generation system comprising thermal and pumped- 

storage hydro plants is considered. Two kinds of models for the cost-optimal 
generation of electric power under uncertain load are introduced: (i) a dy- 
namic model for the short-term operation and (ii) a power production plan- 
ning model. In both cases, the presence of stochastic data in the optimization 
model leads to multi-stage and two-stage stochastic programs, respectively. 
Both stochastic programming problems involve a large number of mixed- 
integer (stochastic) decisions, but their constraints are loosely coupled across 
operating power units. This is used to design Lagrangian relaxation methods 
for both models, which lead to a decomposition into stochastic single unit 
subproblems. For the dynamic model a Lagrangian decomposition based al- 
gorithm is described in more detail. Special emphasis is put on a discussion 
of the duality gap, the efficient solution of the multi-stage single unit sub- 
problems and on solving the dual problem by bundle methods for convex 
nondifferentiable optimization. 



Keywords: hydro-thermal power system, uncertain load, stochastic 
programming, multi-stage, two-stage, mixed-integer, 
Lagrangian relaxation, bundle methods 
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1 Introduction 

The efficient operation and planning of electric power generation systems play 
an important role for electric utilities as well as the whole human activity. 
On the one hand, the efficient use of the available fuel for the production 
of electrical energy is of growing importance, both monetarily and because 
most of the primary energy sources, which today’s energy supply is based 

*This research is supported by the Schwerpunktprogramm “Echtzeit-Optimie- 
rung grofier Systeme” of the Deutsche Forschungsgemeinschaft 
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on, are not renewable and have limited scope. Savings of a small percentage 
in the operation of a moderately large power system represent a significant 
reduction in operation cost as well as in the quantities of fuel consumed. 
On the other hand, in the future, the human community and, in particu- 
lar, the power supply industry will be confronted with general economic and 
ecological conditions that are partly contradictory and aggravating. Some of 
these conditions are the rise in global energy demand, the scarcity of essen- 
tial resources and the limits to the local and global environmental damage. 
Another contemporary challenge for the electric utility industry arises from 
the changes of market structures for electric power. There has been a world- 
wide movement towards deregulation of the electric utility industry and an 
opening of the market to nonutility participants. Moreover, there are plans 
to open the use of the transmission system in the European Community. 
All this has led and will further lead to a growth of the number and size 
of energy transactions. This development raises questions about the prices 
involved which are based on market actions rather than on costs as in tradi- 
tional delivery contracts. 

These issues have motivated a growing interest in applying mathematical 
modelling and optimization techniques for optimal system operation. In- 
deed, there is already a long tradition for applying mathematical program- 
ming methods and software to the solution of many relevant engineering 
problems (e. g. economic dispatch and unit commitment; see [67], [69] and 
the references therein). The recent substantial progress in many areas of 
mathematical optimization (e. g. in linear, mixed-integer, nonlinear, nondif- 
ferentiable and stochastic programming) opens the road to solving more and 
more involved models (e. g. [22]). Such complex and large optimization mod- 
els arise, for instance, for the optimal operation of a hydro-thermal system 
when including additional aspects like data uncertainty, other regenerative 
sources of energy, the mid-term management of reservoirs, electricity trad- 
ing etc. Models of this type are usually characterized by a combination of 
several difficulties like continuous as well as binary decision variables, very 
large dimension, nonlinearities (e. g. in hydro modelling, fuel costs, price 
structures in fuel as well as in electricity purchases) and the uncertainty of 
problem data (e. g. uncertainty of load forecasts, streamfiows to reservoirs, 
pricing schemes, generator failures etc.). 

The present paper aims, in particular, at applying a mathematical methodol- 
ogy, called stochastic programming, for handling uncertain data in optimiza- 
tion models. Stochastic programming is mostly concerned with problems 
that require a here-and-now decision on the basis of given probabilistic in- 
formation on random quantities, but without making further observations. 
Possible formulations of stochastic programming models depend on when 
decisions must be taken relative to the realization of the random variables 
(e. g. at several stages in a dynamic model), the degree to which the con- 
straint structure must be satisfied (e. g. with some probability), and the 
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choice of the (stochastic) objective function (e. g. expected costs). 
Stochastic programming approaches for tackling models in electric power 
generation under uncertainty have already found considerable attention (cf. 
chapters 24-26 in [17] for earlier works). We briefly mention here some of 
the recent and relevant works in this direction. A multi-stage stochastic op- 
timization model for the optimal scheduling of a hydro-thermal generation 
system with uncertain inflows is developed in [51]. The authors present a 
solution strategy based on Benders decomposition and test results for a sys- 
tem comprising 39 hydroelectric plants, one aggregate thermal unit and a 
yearly planning period with monthly stages. The paper [10] offers an aug- 
mented Lagrangian decomposition technique for scheduling power systems 
under random disturbances which are modelled by scenario trees. In [32] a 
multi-stage stochastic program for scheduling hydroelectric generation under 
uncertainty is described and solved by an enhanced version of nested Benders 
decomposition. The paper also reports on the generation of monthly stream- 
flow scenario trees and on model validation in the user’s environment of the 
Pacific Gas & Electric Company. In [11] stochastic programming techniques 
based on Benders decomposition and importance sampling are applied to 
the facility expansion planning of electric power systems under uncertainty 
of the availability of generators and transmission lines, and on the demand. 
Schemes for the pricing of electric power, which is subject to demand and sup- 
ply uncertainties, are designed and compared in [31] by means of a two-stage 
stochastic recourse model. The following papers deal with power scheduling 
under uncertain load. A two-stage stochastic program with simple recourse 
for the daily economic dispatch in a thermal power system is developed and 
solved in [8] under the assumption that the marginal distributions of the load 
are normal. In [25] and [26], this model is extended to power systems compris- 
ing thermal and pumped-storage hydro units and general load distributions. 
The extended model is solved by combining a smooth nonparametric estima- 
tion procedure for the marginal load distributions with standard nonlinear 
programming methods and it is validated by solving the daily economic dis- 
patch problem of a system involving 24 thermal and 5 pumped-storage plants. 
Further extensions of the latter model by allowing for more general dynam- 
ics between decision and observation and for more appropriate recourse cost 
functions are discussed in [23] and [58]. These models do not yet include 
start-up and shut-down decisions into the optimization process. This is real- 
ized in [65], where a stochastic unit commitment problem for a thermal power 
system and a corresponding solution technique based on progressive hedging 
are developed. The progressive hedging methodology (cf. [57]) leads to a 
successive decomposition into scenario subproblems, which are deterministic 
unit commitment problems, and solved by Lagrangian relaxation and by an 
adapted subgradient method for dual maximization. In [66], the authors re- 
port on encouraging test runs for large real-life models. 

The present paper aims at the development of two kinds of models for the 
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cost-optimal scheduling of electric power in a hydro-thermal generation sys- 
tem under uncertain load: a dynamic stochastic recourse model for the short- 
term operation and a two-stage stochastic production planning model. Both 
models are further extensions of the stochastic models described in [25], [23] 
and [58]. They represent mixed-integer stochastic optimization problems 
which are large-scale for moderately large power systems. The second aim 
of the present paper consists in designing Lagrangian decomposition proce- 
dures for the two models by exploiting the particular structure of coupling 
constraints. 

The models arise from a cooperation with the electric utility VEAG Vere- 
inigte Energiewerke AG, which supplies the Eastern part of Germany. The 
VEAG owned generation system (in 1995) consists of 25 (coal-fired or gas- 
burning) thermal units and 6 pumped-storage hydro plants. Its total capac- 
ity is about 13.150 megawatts (MW) including a hydrogeneration capacity of 
1.700 MW; the systems peak load amounts to 8.620 MW (in 1995). Hence, 
optimal scheduling of the VEAG-system exhibits two special features: the si- 
multaneous optimization of thermal and hydro capacity is indispensable and 
the model is more large-scale than ever when including stochasticity. This 
gives rise to the need of solution algorithms for large-scale stochastic opti- 
mization problems which allow for handling mixed-integer decisions. 
Existing solution procedures for large-scale stochastic programs are mostly 
based on approximating the underlying probability distribution by a dis- 
crete measure having finite support and on utilizing decomposition tech- 
niques for solving the large-scale approximate (deterministic) programs. For 
an overview and a discussion of much of the work done in this direction 
we refer to [15], [17], [20], [33], [52], [68]. In addition, we mention some of 
the recent relevant papers on decomposition approaches in stochastic pro- 
gramming. Primal decomposition techniques are based on the L-shaped or 
Benders decomposition method ([63]), its nested extension for multi-stage 
models ([4], [24]), and on regularized decomposition ([60]). A second group 
of (sometimes called dual or scenario) decomposition methods relax nonan- 
ticipativity constraints by introducing Lagrangian terms. For instance, the 
progressive hedging algorithm ([57]) and the scenario decomposition methods 
in [46], [59] are based on introducing augmented Lagrangians. Another aug- 
mented Lagrangian method by relaxing the recourse constraints is developed 
in [12]. A third group of methods consists of algorithms that combine decom- 
position and sampling techniques in various ways. For instance, sampling 
techniques are used for the generation of cuts in stochastic decomposition 
methods ([28]), for the efficient calculation of multivariate expected values 
by importance sampling ([30]), and for reducing the large dimensionality via 
EVPI-sampling ([13]) within nested Benders decomposition. Methods of a 
fourth group combine decomposition schemes and iterated approximations 
via refinement strategies (cf. [20], [21] and chapt. 3.5 in [33]). 

Most of these numerical methods cannot be applied directly to stochastic 
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programs involving integrality constraints. Methods for solving (mixed-) in- 
teger stochastic programs are rather rare. We refer to [62] for a brief overview 
of some recent approaches to stochastic integer programming. Moreover, let 
us mention a recently developed stochastic branch and bound method ([61]) 
and a dual decomposition method based on relaxing the scenario constraints 
and on (deterministic) branch and bound techniques ([9]), which also applies 
to mixed-integer situations. 

Our paper is organized as follows. We introduce and discuss the two stochas- 
tic power scheduling models in Section 2. In Section 3 we briefly recall the 
Lagrangian relaxation approach and review some recent progress in solving 
the nondifferentiable duals. In the remaining two sections we develop La- 
grangian decomposition methods for the dynamic recourse as well as for the 
two-stage stochastic model by relaxing coupling constraints. The dualization 
argument and the duality gap, the separability structure and the solution of 
the stochastic single unit subproblems are discussed in more detail for the 
dynamic model. 



2 Models 

2.1 Modelling a Hydro-Thermal System 

We consider a power generation system comprising (coal-fired and gas-burn- 
ing) thermal units, pumped-storage hydro plants and interchange contracts 
between interconnected utilities. We will develop and describe a mathemat- 
ical model for a power system of this kind which has its origin in the earlier 
papers [25], [26]. The models allow for the simultaneous scheduling of all 
units and contracts over a certain time horizon. 

Let T denote the number of time intervals obtained by discretizing the oper- 
ation horizon. This discretization may be chosen uniformly (e. g. hourly or 
half-hourly) or non-uniformly. Let I and J denote the number of thermal and 
pumped-storage hydro units in the system. Delivery contracts are regarded 
as particular thermal units, but may have cost functions that are essentially 
different (e. g. nonconvex) from typical thermal costs. The decision variables 
in the model correspond to the outputs of each unit, i. e., the electric power 
generated or consumed by each unit of the system. These decision variables 
are denoted by 

Ui . p\ , i = t = 

s] , w] , j = , t = i,...,r, 

where u\ £ {0, 1} and p\ are the on/off decisions and the production levels 
of the thermal unit i during the time period t. Correspondingly, 5 ^ , are 
the generation and pumping levels of the pumped-storage plant j during the 
period t, respectively. Thus, u\ = 0 and = 1 mean that unit i is oflF- 
line and on-line during period t, respectively. Further, by ij we denote the 
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storage volume in the upper reservoir of plant j at the end of interval t. All 
variables mentioned above have finite upper and lower bounds representing 
unit capacity limits and reservoir capacities of the generation system: 

<Pi< , Ui e {0, 1}, i = 1, . . . , t = 1, . . . , T, 

0 < sj < 0<w]< (2.1 ) 



The constants and denote the minimal/maximal 

outputs of the units and the maximal storage volumes in the upper reservoirs 
during period t, respectively. The dynamics of the storage volume, which is 
measured in electrical energy, is modelled by the equations: 



fO _ fin pT ^ p end — 1 T 



( 2 . 2 ) 



Here, (}J^ and denote the initial and final volumes in the upper reservoir, 
respectively, and r/j is the cycle efficiency of plant j. The cycle efficiency is 
defined as the quotient of the generation and of the pumping load that corre- 
spond to the same volume of water. The equalities (2.2 ) show, in particular, 
that there occur no in- or outflows in the upper reservoirs and, hence, that 
the pumped storage plants of the system operate with a constant amount 
of water. Together with the upper and lower bounds for the equations 
(2.2 ) mean that certain reservoir constraints have to be maintained for all 
pumped-storage plants during the whole time horizon. 

Further single-unit constraints are minimum up- and down-times and possible 
must-on/off constraints for each thermal unit. Minimum up- and down-time 
constraints are imposed to prevent the thermal stress und high maintenance 
costs due to excessive unit cycling. Denoting by the minimum down-time 
of unit z, the corresponding constraints are described by the inequalities: 

-Ui<l-Ui , T = < 4- l,...,min{i + Ti - 1, T}, t = 1,...,T. (2.3 ) 



Analogous constraints can be formulated describing minimum-up times. Note 
that further single-unit constraints could be added, such as generator fuel 
limit constraints or air quality constraints in the form of limits on emissions 
from fossil-fired units. 

The next constraints are coupling across the units: the loading and reserve 
constraints. The first constraints are essential for the operation of the power 
system and mean that the sum of the output powers is greater than or equal 
to the load demand in each time period. Denoting by dt the load demand 
during period t, the loading constraints are described by the inequalities: 



/ J 









(2.4) 



In order to compensate unexpected events within a specified short time pe- 
riod, a spinning reserve, describing the total amount of generation available 
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from all units synchronized on the system minus the present load, is pre- 
scribed. For instance, such events are sudden load increases and the outage 
of one or more units. Beyond spinning reserve various classes of off-line re- 
serves may be involved. These include gas-turbine units and pumped-storage 
hydro plants that can quickly be brought on-line and up to full capacity. 
Hence, the spinning reserve constraints concern the synchronized thermal 
units and are given by the following inequalities: 

/ 

(2.5) 

2=1 

where > 0 is a specified spinning reserve in period t. 

The objective function is given by the total costs for operating the thermal 
units. These costs consist of the sum of the costs of each individual unit over 
the whole time horizon, i. e., 

/ T 

Y, [FCitipl <) + SCu , (2.6 ) 

2=1 t=l 

where FCu is the fuel cost function and SCu are the start-up costs for the 
operation of the thermal unit i during period t. We make the natural as- 
sumption that FCit{0, 0) = 0 and that FCit(*,l) is strictly monotonically 
increasing. Often fuel cost functions are piecewise linear-quadratic and con- 
vex, i. e., they are functions of the form 

FCit{p, = jnax fu{p) -h u Ci, (2.7 ) 

where fa are linear or convex quadratic functions having the property 
max fii{0) = 0 and q is a fixed cost term. Non-convex set-ups for fuel costs 

£—X, . . .^L 

are also possible and of particular importance for modelling costs in delivery 
contracts including discounts. Typical cost functions of this kind are general 
piecewise linear functions. Note that such functions can be modeled using 
binary variables for selecting the correct line segment for a given value of p 
(see e. g. [47]). 

The start-up costs SCu , where Ui{t) = (uj, ...,u^), can vary from a 

maximum cold-start value to a much smaller value when the unit i is still 
relatively close to the operating temperature. A simple description for start- 
up costs is given by 

SCu (ui(t)) = C( max{u- - t = 2, . . . ,T, 

where C( are fixed costs. This description has the advantage that it can be 
expressed in linear terms. On the other hand, it does not refiect that the 
costs depend on the cooling time. Alternatively, a more involved start-up 
cost function, which is time-dependent, is given by 

SCit{ui{t)) = (c{ + Ci(l- exp max 0 } , 
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where C( are again fixed costs, Cf cold-start costs, the thermal time con- 
stant for the unit i and t — the down-time of unit i until period i. e.. 

Si = maxjs e N : j = 2, . . . ,s| . 

Altogether, minimizing the objective function (2.6 ) subject to the constraints 
(2.1 )-(2.5 ) leads to a cost-optimal schedule for all units of the power system 
during the specified time horizon. It is worth mentioning that a cost-optimal 
schedule has the following two interesting properties, which are both a conse- 
quence of the strict monotonicity of the fuel costs. If a schedule (u, p, s, w) is 
optimal, then the loading constraints (2.4 ) are typically satisfied with equal- 
ity and we have = 0 for all j = 1, . . . , J, t = 1, . . . , T, i. e., generation 
and pumping do not occur simultaneously (see [27]). 

The minimization problem (2.1 )-(2.6 ) represents a mixed-integer program 
with (possibly) nonlinear objective, linear constraints, and IT binary and 
(/ -f 2 J)T continuous variables, respectively. For a typical configuration of 
the VEAG owned generation system with 1 — 22 (thermal), J = 6 (hydro) 
and T = 192 (i. e., 8 days with hourly discretization), this amounts to 4224 
binary and 6528 continuous variables. 




Fig. 1: load curve and hydro- thermal schedule 

For this park of the power system and for a peak load week. Figure 1 shows a 
typical load curve and a corresponding cost-optimal hydro-thermal schedule. 
Note that the mixed-integer program is solved by the methods described in 
[14], which Figure 1 is taken from. The load curve in Figure 1 shows two 
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types of cycles: In general, the load is higher during the morning and the 
early evening (peak), with a small valley during the early afternoon, and it 
is lower during the night. In addition, the consumption of electric power 
exhibits a weekly cycle, because the load is lower over weekend days than 
weekdays. The efficient operation of pumped-storage hydro plants exploits 
these two cycles. They are designed to save fuel costs by serving the peak 
load with hydro-energy and then pumping to refill the reservoir during off- 
peak periods, i. e., during the nights and weekends. The hydro schedule in 
Figure 1 reflects this typical operation of pumped-storage plants. They may, 
in fact, be operated on a daily or weekly cycle. Figure 1 records a schedule 
when operating on a weekly cycle. The remaining load, i. e., the difference 
between the original system load and the hydro schedule, shows a much more 
uniform structure than the original load. This portion of the load is covered 
by the total thermal output. Among the thermal plants of the system, the 
base-load units are loaded nearly 100% of the time horizon and the ’’cycling” 
units are loaded for periods depending on their costs and the shape of the 
load pattern. 

So far we have tacitly assumed that the electrical load is deterministic over 
the whole time horizon. In electric utilities, schedulers forecast the electrical 
load for each time period of the day or week in advance. For this purpose 
they make use of historical load data (e. g. of the same week from previous 
years), of their personal experience and of statistical methods (e. g. time 
series or regression analysis). But, clearly, the actual load demand may devi- 
ate from the predicted load at any time period for various reasons. Usually 
electric utilities record the actual system load and save the data over several 
years. These statistical data provide a basis for the development of stochastic 
models for the load process and the optimization of power scheduling. 

Next we decribe two stochastic models for the optimal scheduling of electric 
power which differ mainly in the quality of available information on the load 
stochasticity. The first one represents a model for the optimal on-line or 
short-term operation of a power system, where future consequences of actual 
scheduling decisions as well as the future load uncertainty are taken into ac- 
count. In this model we assume that the load is completely known (i. e., 
deterministic) at the beginning of the time horizon and that the load uncer- 
tainty increases with the growing number of time periods. Secondly, a model 
for short- or mid-term power production planning is developed. The essential 
difference to the first model is that the quality of available information on 
the load uncertainty does not depend on time. It aims at determining (op- 
timal) power production schedules for a future planning period (e. g. next 
week or month). The second model represents a two-stage stochastic pro- 
gram, whereas the first one is a dynamic (multi-stage) stochastic optimization 
problem. Both models involve mixed-integer decisions in all stages. 
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2.2 Dynamic Recourse Model 

We assume that the load {(f :t — forms a (discrete-time) stochastic 

process on some probability space (fi, ji)^ that the information on the load 
is complete for t = 1, and that the uncertainty increases with growing t. Let 
{At :t = l,...,T}be the filtration generated by the load process, where At 
is the /i-completed cr-field defined by the random vector (d^, . . . ,d^). Hence, 
we have Ai C A 2 ^ ^ At C ... C At Q A and A\ is the //-completion of 

The sequence of scheduling decisions {(u^ w*) : t = 1, . . . ,T} 

also forms a stochastic process on (f),^,//), which is assumed to be adapted 
to the filtration of cr-fields, i.e., nonanticipative. The latter condition means 
that the decision w^) depends only on the data history (d^, . . . , d^) 

or, equivalently, that p^ w^) is ^^-measurable. We mention that this 
condition is often formulated in terms of a closed linear subspace that is deter- 
mined by the conditional expectations with respect to the cr-fields At ([55], 
[12]). Since all decision variables are uniformly bounded, we may restrict 
our attention to decisions (u, p, s, w) belonging to (fi, A^ p; where 
m := 2{I -f- J)T. Then the nonanticipativity condition can be formulated 
equivalently as 



X = (u, p, s, w) e A, , 



(2.8) 



where rrit := 2(7 -h J), and the (stochastic) optimization problem consists in 
minimizing the expected cost (cf. (2.6 )) 



F{x) = IE 



{ / T 

EE 

i=l t=l 



[FCit {pI u \) + SCit (ui(t))] 



(2.9) 



over all decisions (u, p, s, w) satisfying the nonanticipativity constraint (2.8 ) 
and p-almost surely the constraints (2.1 )-(2.5 ). Among the constraints 
(2.1 )-(2.5 ), (2.2 ) and (2.3 ) refiect the dynamics of the model and (2.4 ), 
(2.5 ) are coupling across units. Altogether, the stochastic program involves 
2(7 -f J)T stochastic decision variables and, hence, an enormous number of 
stochastic scheduling decisions for real-life power generation systems. It is a 
discrete time dynamic or multi-stage recourse problem, where the ’’stages” do 
not necessarily refer to time periods, but correspond to steps in the decision 
process where observations of the uncertain environment (i. e. the load) take 
place. The number K of stages of the dynamic model thus corresponds to 
the (maximal) number of time steps = 1 < ^2 < • • • < < • • • < = T 

such that we have the strict inclusion At^ C fc = 1, . . . ,7C - 1 , for 

the cr-fields belonging to the filtration. 

For the numerical solution of the dynamic recourse model we now assume 
that a discrete multivariate probability distribution of the stochastic load 
vector 

d= (dS . . . ,d^), whose finite support consists of the atoms or scenarios 
dn = (djj, . . . , d^) , with the probabilities tt^ = fi{d = dn) , n = 1, . . . , TV, is 
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given. Let rik, k = 1, . . . ,K, denote the number of atoms corresponding to 
the cr-field Atk • Then we have rii = 1 < n 2 < . . • < < . . . < uk = N and 

the following scenario constraints at each stage k £ 

implies d^^ = for alH = 1, . . . , (2.10 ) 

Hence, the information on the load can be represented in the form of a sce- 
nario tree. Each path from the root to a leaf of the tree corresponds to one 
scenario; each branching node corresponds to a (decision) stage. Figure 2 
shows an example of a load scenario tree over a weekly time horizon, where 
observations of the load are made every day, leading to one additional daily 
scenario. 




k=l 



k=8 



Mon Tue Wed Thu Fri Sat Sun 



Fig. 2: Load scenario tree 

The scenario information may have various origins. It can be obtained as 
an approximation of the multivariate load distribution, based on sampling 
from empirical data or on scenarios provided by experienced schedulers. We 
do not go into detail here, but refer to [16] (and the references therein) for a 
discussion of various approaches to the generation of scenarios that reflect the 
structure of the model as well as the information available on the underlying 
probability distribution. We also refer to [65] where several strategies for 
generating load scenarios (e. g. handling forecast uncertainty) are discussed. 
Although the primary aim of generating a scenario tree is to obtain a reason- 
able approximation for the underlying probability distribution, a compromise 
between the quality of approximation and the size of the approximate prob- 
lem has to be taken into consideration, too. The size of the scenario based 
multi-stage model easily grows out of hand with increasing number of scenar- 
ios and stages. In order to illustrate this fact, let Pi,n^ and 

ij^n, denote the n-th scenario of the variables Ui, pi, Sj, wj, and ij. Then 
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the scenario based model consists in minimizing the objective function 

NIT 

EEE-. \^FCit {Pi^n') (^i,n(^))] (2.11 ) 

n=l i=l t=l 

over all decisions {(un, Pn? Wn) : n = 1, . . . , A^} satisfying the bound and 
integrality constraints (2.1 ), the system dynamics 

pt _ pt-l _ t I P^ — P'^'^ PT — pend A — J 

^j,n ~ ^j,n ^j,n ^ ^j,n ~~ ’ ^j,n ~ ~ ■'■5 • • * j 

u\~n ~Kn ^ 1 -<,n> t + 1, • • • , min{t + Ti - 1,T}, (2.12 ) 

t=l,...,T, n=l,...,N, 
the loading and reserve constraints 

SpU + - Kn) > ^n. J] ~ Pi) > r\ (2.13 ) 

i=l j=l i=\ 

n=\,...,N, 

and the scenario nonanticipativity constraints, i.e., the equality 

K) Pn) «n. «^n) = pL 4- 

for t = tk implies that the same equality holds for all t = 1, . . . ^ = 

i,...,a:. 

When regarding the nonanticipativity constraints and introducing decision 
variables at each node of the scenario tree, the number of decisions in the 
(deterministic) optimization model (2.11 )-(2.13 ) amounts to 

K 

2{I J) ^ rik {tk-\-i — tk)- Hence, the model may easily become extremely 

k=l 

large if the scenario tree contains too many paths. Even for the (very) small 
scenario tree in Figure 2 (i. e., with K = 7^ uk = K and tk^i — tk = 24) 
the model involves 672 • I binary and 672 • (/ -f 2 J) continuous variables and 
standard methods including those reviewed in Section 1, may not be able to 
solve the problem in reasonable time. This requires other techniques that 
exploit the underlying structure of the original stochastic model. 

2.3 Two-Stage Stochastic Model 

Again we assume the load t = 1, . . . , T} to be given as a (discrete-time) 
stochastic process on some probability space (fl, p). However, this time 
the load process does not involve an information structure and the decision 
process consists of two stages where the first-stage decisions correspond to 
the here-and-now schedules for all power generation units over the whole time 
horizon. The second-stage decisions correspond to future compensation or 
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recourse actions of each unit in each time period in response to the environ- 
ment created by the chosen first-stage decision and the load realization in 
that specific time period. Hence, the aim of such a two-stage dynamic model 
can be formulated as follows: Find an optimal schedule for the whole power 
system and planning horizon such that the uncertain demand can be com- 
pensated by the system, all system constraints are satisfied and the sum of 
the total generation costs and the expected compensation costs is minimal. 
In order to give a mathematical formulation of the model, let (u, p, s, w) de- 
note the first-stage scheduling decisions as in Section 2.1 and (n, p, s, w) de- 
note the stochastic compensation decisions having the components 
Wp i = 1, . . . , /, j = 1, . . . , J, t = 1, . . . , T, which correspond to the com- 
pensation actions of each unit at time period t. 

In addition to the (non-stochastic) constraints for (u, p, s, w), (2.1 ) (ca- 
pacity limits), (2.2 ) (storage dynamics), (2.3 ) (minimum down-time con- 
straints) and (2.5 ) (reserve constraints), we have to require that the compen- 
sation actions also satisfy certain system constraints. These are the unit ca- 
pacity limits, minimum-down time constraints and reservoir capacity bounds 



+p\< u\ G {0, 1}, i = 1, . . . , I, (2.14 ) 

- <1-Ui, T = t + min{t + Tj - 1, T}, i = (2.15 ) 



0 < sj • + sj < sft^, 


0 <w*j +Wj < 






O 

II 

II 


(2.16 ) 


t. = - sj. + 


j t — 1,..., T, p a. s. 





In other words, the constraints (2.16 ) for the hydro scheduling decisions 
mean that the sum of first-stage decisions and recourse actions is feasible, 
too. The formulation (2.14 ) of the thermal unit capacity limits for the com- 
pensation stage becomes more involved because the term p\ul introduces a 
nonlinear constraint connecting first- and second-stage variables. The non- 
linearity in (2.14 ) is avoided when requiring that a thermal unit, which is 
scheduled to be on-line in the first-stage, must not be off-line in the compen- 
sation action. In this case, (2.14 ) can be replaced by the (linear) constraints: 

< Pi + Pi < P““Wi. «i < Ui, U- € {0, 1}, i = 1, . . . , 7. (2.17 ) 

This formulation of the thermal unit capacity limits seems to be quite natural 
and realistic because generation systems often possess sufficient flexibility 
to compensate load decreases by lowering output levels of thermal units. 
However, there might be a need for new on-line units in order to compensate 
unpredictable load increases. Another possible compensation strategy could 
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be based on a subdivision of the set of available thermal units into two sets 
Xi and X 2 such that Xi U I 2 = {1, . . . , and the conditions 

u\ z=Lu\,ieXi, and u\<u\, i el 2 ,t = - a. s. , 

are satisfied. This means that only some of the available thermal units may 
change their on/off state when compensating uncertain load. Prom a mod- 
elling point of view this strategy would lead to a reduction of the number of 
binary variables. 

In the following, we always assume that (2.17 ) instead of (2.14 ) is satisfied. 
Observe that the conditions (2.15 ) and (2.17 ) imply (2.3 ). 

The loading constraints (2.4 ) are modified by requiring that the sum of the 
first-stage power outputs of all generation units satisfies the load with some 
probability nt G (0, 1) in period ^ = 1, . . . , X, and that the sum of the total 
power outputs satisfies the load with probability one. Denoting by the 
distribution function of the (modified) loading constraints are given by 
the following inequalities: 

I J 

T.P' + (2.18 ) 

i=l j=l 



I J 

i=l j=l 



a.s. 



(2.19 ) 



A variant of (2.18 ), which will be considered in Section 5, is that the term 
(nt) is replaced by the expected load E{dt), t = 1, . . . ,T. In both cases, 
the constraint (2.18 ) means that the sum of the first-stage output power 
satisfies a certain predicted or approximated load and the second-stage deci- 
sions take care of satisfying the stochastic load with probability one. 

Since the real operation of the system takes place during the compensation 
action, the objective function corresponds to the total average costs for op- 
erating the thermal units, i. e.. 




( 2.20 ) 



where FCu and SCu denote the fuel cost and start-up cost functions, respec- 
tively, for the operation of unit i during period t, and Ui{t) := (uj, . . . 

The stochastic power production planning model consists then in minimiz- 
ing the objective function (2.20 ) over all deterministic decisions (u, p, s, w) 
and all stochastic decisions (u, p, s, w) G L°^(fl, A, p; M^) satisfying the 
constraints (2.1 ), (2.2 ), (2.5 ), (2.15 )-(2.19 ). The model represents a two- 
stage stochastic mixed-integer program involving 2(/-h J)T deterministic and 
2{I + J)T stochastic decision variables. Similar to the dynamic model in the 
previous section, only the loading constraints (2.18 ), (2.19 ) and the reserve 
constraints (2.5 ) are coupling across units. 




36 



3 Lagrangian Relaxation Approach 

Lagrangian relaxation is a solution technique primarily for minimizing a non- 
smooth function. We would like to recall the basic ideas and some facts in 
order to clarify the reasons that make this approach appropriate for solving 
the problems introduced in the previous section. Our presentation is inspired 
by [40]. Let us consider an optimization problem 

min/(a;) subject to a: € C, g{x) < 0, (3.1 ) 

where / : -> JR, C C JR^, g : JR^ We make the general 

assumption that / and pj, j = 1, . . . , m are convex functions and there exists 
an X e C : g{x) < 0 . 

We suppose that the functions / and g and the set C have some special 
structure, which makes the Lagrangian problem 

min [L(a;, A) = f{x) -h \g{x)] subject to x e C (3.2 ) 

much easier to solve than the problem (3.1 ), where A G Let us assume 
the following: 

(A) For all A G IR^ there exists an element x\ e C such that 

0(A) = minL(x, A) = L{x\^ A). 

x^C 

Be aware that !/(•, A) may have several minima for some A, but 0(A) is well- 
defined, since the minimal value is non-ambiguous. By the weak duality 
theorem, we have 

0(A) < minL{x, A) < f{x) 

xEC 

for all feasible points a; in (3.1 ). The following statement is straight-forward 
but important. 

Proposition 3.1 ([18]) Any solution x of the Lagrangian problem (3.2 ) 
solves the perturbed problem (3.3 ): 

minf{x) subject to x e C, -g{X) > g{x), (3.3 ) 

where g{\) = —g{x). 

Proof: For any feasible x in (3.3 ) and A G JR^ we have 

f{x) > f{x) + A[^(a;) + g{\)] 

= L{x, A) + \g{X) > L{x, A) - A • g{x) 

= fix) 

a 

We conclude that if x is ’’almost feasible”, it is ’’almost a solution” of (3.1 ). 
If we succeed in finding a solution to (3.2 ) which is also feasible for (3.1 ), 
then we have a solution to (3.1 ), because the inequality of (3.3 ) is satisfied. 
Having in mind the weak duality theorem, it is clear that any feasible point 
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X of (3.1 ) produces an upper bound f{x) for 0(A). Hence, to solve (3.1 ) 
via (3.2 ) it is necessary to maximize 0 on 

We call 0(-) the dual function, A the dual variable, and the problem 

max 0(A) subject to A € (3.4 ) 

the dual problem to (3.1 ). We show that 0 is a concave function having 
subgradients at all A by virtue of the assumption (A). Let us denote a solution 
of (3.2 ) for A by x. 

0(A) = minL(a:,A) < L{x,X) 

= L{x, A) + (A - X)g{x) 

= 0(A)-(A-A)^(A) 

The latter inequality characterizes concavity and implies 

g{X) e a[-0(A)] 

where 5[— 0(A)] stands for the sub differential of —0 with respect to A calcu- 
lated at the point A. 

Let us suppose that the problem under consideration has a separable struc- 
ture, i. e., the problem is of the following form: 
the variables x = {xi, . . . ,Xn) and Xi G i = l,...,n, 

n 

the objective function /(^) = + /o? 

i=l 

n 

the related constraints gj{x) = j = 1? • • • ? 

i=l 

where /o and g^ {j = 1, . . . ,m) are constants. 

Let us further suppose some special structure of the set C. We assume the 
set C to be the following product 

where Bi C ]R^' are compact convex sets. This means that x\,. . . ,Xi^ are 
binary variables and we consider a mixed-integer problem. 

Furthermore, let us assume the functions fi and to be convex piecewise 
linear or (piecewise) quadratic functions. Then L(-, A) is a convex function, 
too. 

The strong duality theorem does not apply due to the presence of integrality, 
i. e. the structure of the set C. However, we are in a favourable situation to 
have 



• the assumption (A) is satisfied, 

• decomposable structure of the relaxed problem, 

• description of the subgradients of 0(A). 
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We call the following optimization problem a continuous relaxation of the 
problem (3.1 ) 



min/(a:;) subject to x e C, g{x) < 0, 
where C = l]”^ x Bi) . 

Proposition 3.2 The Lagrangian relaxation provides a better lower bound 
of the optimal value of (3.1 ) than the continuous relaxation of the problem. 

Proof; The following sequence of inequalities holds true for each A G 

min f{x) > minL(x, A) > minL(x, A) 
xec xec xec 

g{x)<0 

This implies 

min f(x) > max minL(x, A) 
xec - xeiR^ xec 

g{x)<0 

The maximum above is attained at some Aq since minL(rr, •) is concave 

xec 

piecewise linear or (piecewise) quadratic function bounded from above on 
Consequently, L{x, A) has a saddle point (Ao,xaq) and we obtain by 
virtue of the saddle point theorem: 

min f{x) > max minL(a;, A) = L{Xq,xxq) = min f{x) 
xec xeM^ xec xec 

g(x)<0 g(x)<0 

This proves the assertion. □ 



Observe that L(x,X) has a separable structure with respect to the compo- 
nents Xi, which together with the special structure of C leads to a decom- 
position of the problem 3.2 into n subproblems of dimension ni each. The 
subproblems read 

m 

Pi{\) : min fi{xi) -f- ^ Xjgj{xi) subject to Xi e Ci, 
i=i 



where: 



r {0, 1}”^* if I <i <io 

\ Si if io <i <Ui 



Denoting the marginal functions of the problems above by ©i(A) (i = 1, . . . , n) 
we obtain for the dual function 

n m 

i=l j=l 

Consequently, the dual problem has a separable structure, too. The latter 
observations make an approach to problems with decomposable structure via 
Lagrangian relaxation attractive. A solution procedure should include: 
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• a method for solving the non-smooth concave optimization problem 
(3.4 ). 

• fast algorithms for minimizing the Lagrange-function L(x, A) at a given 

point A, i. e., for solving the subproblems Pi{\)^i = . The 

solution provides then the value of 0 and its subgradients. 

• a technique to obtain a primal feasible solution. 

The latter point needs separate investigations. As already mentioned, a dual 
method does not provide a primal feasible solution due to the integrality 
conditions. Thus, we have to use the information on the dual solution to 
calculate a primal feasible point close to the dual solution efficiently. Due 
to the first proposition, such a procedure will obtain a fairly good point. In 
[2], it is shown that the relative duality gap for mixed integer problems with 
special structure becomes small under certain assumptions. We will see later 
how the estimate given there is modified for the dynamic recourse problem. 
Methods for nonsmooth optimization have been the subject of intensive devel- 
opment during the last 15 years. An algorithm for minimizing a convex func- 
tion known for a long time is the cutting-plane method. It develops the nat- 
ural idea to use subgradient-information and to generate a linear approxima- 
tion of the function associated with it. Let us suppose that, at a certain mo- 
ment, values /(xi), . . . , f{xk) and subgradients zi G 9/(xi), . . . , Zjk G df{xk) 
are available. We define 

fk{x) - max{/(xi)+ < Zi,x - Xi >, 

and, minimizing fk, obtain a further point It is assumed that fk is 

bounded from below on C and we are able to compute values and subgradi- 
ents of /. 

However, this algorithm has some well-known drawbacks. The initial itera- 
tions are inefficient. The number of cuts increases after each iteration and 
there is no reliable rule for deleting them. The minimization of the approx- 
imate function is sensitive when approaching a point of nondifferentiability. 
Further developments have led to the so-called bundle methods which offer 
a stabilizing device based on the following ingredients: 

• a sequence {xn} of stabilized iterates; 

• a criterion (test) deciding whether a new iterate has been found and (or) 
whether the bundle of information, i.e., the approximation fk, should 
be enriched; 

• a sequence {Mn} of positive definite matrices used for a stabilizing 
term. 

Bundle methods are pioneered by Wolfe and Lemarechal. A detailed study 
on the subject can be found in [35] and [29]. A comprehensive review is given 
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in [43]. One description of the main idea of (first-order) bundle methods is 
the following: 

Suppose iterate Xn and a bundle of subgradients Zk G df{yk) have been 
computed. As above, we use the bundle of information to formulate a lower 
approximation of the function /, i. e., 
fn{x) = max{/(2/i)+ < Zi,x-yi>, z = 1, . . . , A:}, and 

1. minimize fn{x) -h | < Mn{x - Xn),x - Xn > 
and let the point x be its minimal point. 

2. compute a nominal decrease 

an = f{Xn) - fk{x) - I < Mn{x - Xn),X - Xn > • 

A constant c G (0, 1) being chosen, we perform the descent test: 
f{x) < f(Xn) - can 

If the inequality is satisfied we set Xn+i = x] yk-\-i = x 
and increase n and A: by 1. 

Otherwise, n is kept fixed, we set yk+i = x and increase A: by 1. In 
some versions (cf. [42]) an additional test is made before increasing k. 

3. The choice of {Mn} given in the literature is: 

- an abstract sequence, as in [39], 

- Mn = /, as in [34], 

- Mn = finl with heuristic rules for computing /x^, in [36], [64], 

- solving a quasi-Newton equation in [42]. 

This description of the bundle methods corresponds to the proximal point 
concept (i. e., the Moreau- Yosida regularization). Recall that, given a positive 
semi-definite matrix M, 

F{x) = inf |/(j/) + ^ < - x), 1 / - x >| (3.5 ) 

is the Moreau- Yosida regularization of the function /. In the classical frame- 
work M should be positive definite. In [42], it is suggested to allow a degen- 
erate proximal term and it is shown there that the essential properties can 
be reproduced also in this case. A relationship between these concepts and 
certain first order bundle methods was observed by several authors, e.g. [29]. 
Methods of order higher than one are studied in [36] and [64] where a single 
stabilizing parameter is varied. 

In [36] the choice of weights /x for updating the matrix in the proximal term 
is considered. The matrix M is intended to accumulate information about 
the curvature of / around the point x. Safeguarded quadratic interpolation is 
proposed for choosing the weights /Xn+i so that the curvature of / between Xn 
and X is estimated. The algorithm computes a direction for the next iterate 
Xn+i by solving a quadratic program, then the descent test and the update 
of the bundle of subgradients are modified accordingly. The reported compu- 
tational experiments indicate that this technique can decrease the number of 
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objective evaluations necessary for reaching a desired accuracy in the optimal 
value significantly. 

The algorithms presented in [7], [42], [44], referred to as variable metric bun- 
dle methods, make use of the Moreau- Yosida regularization of the objective 
function and develop some quasi-Newton formulas. Two strategies for updat- 
ing the matrix M in the minimization procedure are suggested in [42]. In the 
first version, called diagonal quasi-Newton method, M is proportional to the 
identity matrix, while the second version uses a full quasi-Newton matrix. 
The matrix is updated at the end of a descent-step, when a new stabilizing 
iterate point is computed. The updating procedure corresponds to a regular- 
izing scheme for the gradient of F. 

In [44] M is a positive definite matrix and, thus, there is a unique solution 
of (3.5 ), which is denoted by y{x). The main idea is to approximate y{x) 
and to vary the matrix M in order to use the information gathered in finding 
one approximation to help in finding the next one. Let J be some approxi- 
mation of the Jacobian J{x) of y{x). A Newton step — [V^F^(x)] VF{x) 
is approximated there by 

[M{I - J)]-^M{y{x) - x) = [I - J]-\y{x) -y), 

where I is the identity matrix. M could be fixed or updated by 

where /j>n is some constant and Gn is an estimate of VF computed by in- 
formation from previous iterations . How to compute the necessary estimate 
J of the Jacobian matrix of y{x) is discussed in detail in [44]. The method 
developed there is called approximate Newton-method. 

A precise study of the second-order properties of the Moreau- Yosida regu- 
larization is presented in [45] for the problem of minimizing a closed proper 
convex function, which is a selection of a finite number of twice continuously 
differentiable functions. It is proved that under certain constraint qualifica- 
tion the gradient VFm is piecewise smooth. Further conditions are formu- 
lated that guarantee a superlinear (quadratic) convergence of an approximate 
Newton method for minimizing F. 

Generally, one can consider any Newton-type method for nonsmooth equa- 
tions in order to solve optimization problems. Newton-type methods in such 
a generality are considered in e. g. [38], [50], [53], [54]. The methods pre- 
sented there are applied to solving optimization problems via augmented 
Lagrangians [54], via the Karush-Kuhn-Tucker equations [38], [53] or via the 
Moreau- Yosida regularization [5]. 

Our review is not an attempt to comment all recent developments of solution 
techniques for nonsmooth optimization problems. We only wish to present 
the main ideas of the well-established methods in order to clarify which of 
them are appropriate for solving the nonsmooth problems studied in the next 
two sections. 
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4 Lagrangian Relaxation for the Dynamic 
Recourse Problem 



In this section, we consider the Lagrangian relaxation approach for the dy- 
namic recourse model (2.1 )-(2.9 ) in detail and sketch a conceptual algorithm 
for solving the problem. The decision variables are uniformly bounded func- 
tions (u, p, s, w) e (n, At, rrit = 2(7 -h J). The variables 

Pi), i = and {sj, wj), j = 1, ... are associated with one sin- 

gle operation unit i, and j, respectively. All constraints except for (2.4 ) and 
(2.5 ) are associated with a single operation unit. Thus, natural candidates 
for the relaxation are the coupling constraints (2.4 ) and (2.5 ). We asso- 
ciate Lagrange multipliers Ai and A 2 with the load- and reserve-constraints, 
respectively. Setting x = {u, p, s, w) and 



L{x,X) = E\j:f:[FCit{pl,u^ + SCit{uim 

U=it=i 

\ i=i j=i / 



(4.1) 



we have to clarify what kind of objects Ai and A 2 are. Duality theorems 
for dynamic models that are relevant for our setting are considered in [56], 
[58]. We utilize the results of [56]. For stating a duality result we neglect 
integrality and substitute u\ G {0, 1} by u\ G [0, 1] in (2.1 ) for a moment. 
We denote the modified constraint by (2.1 )*. 

First, let us recall that the dynamic recourse problem has relatively complete 
recourse if the following procedure leads to a choice of decisions Xk, k = 
1,...,K, almost surely for all stages k: Let xi be a feasible solution of the 
first stage. In the second stage (having a new observation of the load), we 
can choose X 2 satisfying the constraints and the dynamics of the system, i. e., 
in particular, (2.2 ) and (2.3 ) hold true with the corresponding components 
of xi and X 2 . And so forth: In the fc-th stage, we are able to choose a feasible 
decision Xk . 

Nonant icipativity and relatively complete recourse provide sufficient condi- 
tions for considering to be the space of Lagrange multipliers A, instead of 
working with esoteric objects from {L^)* (cf. [56]). 

Suppose, additionally, that strict feasibility holds true. It means, that the 
feasible set determined by (2.1 )*-(2.5 ) has a non-empty interior in 
xj_^ At,p;lR^*), i. .e., there exists a positive real number e, a point 

X G (D, At, and a neighbourhood U of x such that any 

point X = (u,p,s,w) G U satisfies (2.1 )*-(2.3 ) and the inequalities: 



/ j 

i=i 



i=l 
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I 



Y.inlpT^-pD^r^+e, 






In terms of a power generation system, strict feasibility means that the gen- 
eration system should have the capacity to produce power that satisfies ev- 
ery slightly changed demand and reserve-condition regarding the other con- 
straints. This is a reasonable and acceptable restriction, which can be as- 
sumed to be satisfied. 

We denote 

are fulfilled | ; 

A = |a G X (n. At, : Ai, A 2 > 0 /i - a.s. for t = 1, . . . ,t| . 



X = lx e x^L^(n, At, : (2.1 )* - (2.5 ) 



The following duality statement holds true. 



Proposition 4.1 The Langrange function (4-1 ) has at least one saddle point 
{x, X) E X X A assuming (2.6 ), (2.7 ), relatively complete recourse and strict 
feasibility. In order that the function x E X be an optimal solution of the 
problem (2.1 ) - (2.9 ) it is necessary and sufficient that the following condi- 
tions be satisfied a.s. for some A G A ; 



Ai 



0 



i=l j=i 

-T, - Pi) 

i=l 

E dxL (x^, A*) , t = 1, . . . ,T. 



= 0 



(4.2) 



Proof: The assertion follows by Theorem 1 and the arguments of Theorem 
7 from [56]. □ 



Now we consider the relaxed problem: 
min L(u, p, s, w) subject to (2.1 ) - (2.3 ). (4.3 ) 

(u,p,s,w) 

Denoting the marginal function of the latter problem by 0(A), the dual prob- 
lem reads 

max 0(A) subject to A G A. (4.4) 

Now, we show that the dual problem is decomposable with respect to the 
single units. Using the notations of the previous section, we define 

Xi = {ui, Pi), i = l,...,I, xi^j = {sj, Wj), j = 1,. . . , J, n = I J, 




44 



and observe that all functions are separable with respect to z = 1, . . . , n. 
We define functions 0i(*) and 0j(-)- 

QiiX) = min E f [FCu (p\, u\) + SCit{ui{t)) - X\p\ 

{Ui,Pi) t=l 

-^\{uiPTr-pt)] 

= min ^ X; [ min {^Cu {p\, u\) - (Aj - A|) pf } 

“i t=l Pi 

+SCit(ui{t)) - Xiujp^^^^ 

The latter equality holds by the separable structure of the functions FCu 
with respect to p\ and u\ (cf.(2.7 )) and the possibility to exchange min and 
IE in the above expression. 

T 

= min E'^ [~X\ (s‘ - wj)] 

Consequently, the function 0(A) can be expressed as: 

I J T 

0(A) = 0i(A) + ^ 0,(A) + JE ^ [A* d* + A‘ r‘] 

i=l j=l t=l 

It has a separable structure with respect to the single units as do the con- 
straints (2.1 ) - (2.3 ), (2.6 ) - (2.8 ). Thus, the value and subgradients of 
0(A) can be computed for a given argument A by solving the subproblems 
Pi(A), z = 1 ,...,/ and Pj{\) j = 

Pi{X) : minEJ2\ min {FCu (j>\, u\) - (Af - X^)p\} + SCuiui{t)) 

i=i p‘i 

subject to (2.1 ), (2.3) 



T 

Pj{X) : min E ^ [-AJ (sj - wj)] subject to (2.1 ), (2.2 ), (2.6 ) 

(Sj,Wj) I— I 

Note that these are dynamic recourse problems themselves associated with 
the single generation units. The subgradients of 0(A) with respect to Ai and 
A 2 are given by 

- E - E (sj - 

i=l j=l 

»■* - E -Pi)> 

i=l 

where {u\,p\) and {Sj,Wj) are solutions of Pi{X) ,i = 1 . . . ,1, and Pj{X), j = 
respectively. 
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Now, we suppose that the measure ji has a finite support. Then the dynamic 
recourse problem can be viewed as a large-scale finite-dimensional optimiza- 
tion problem. Let us check the properties of 0 discussed in the previous 
section. The concavity of 0 follows immediately. The assumptions of Propo- 
sition 3.1 and Proposition 3.2 are satisfied for this problem. Observe that the 
assumption (A) of Section 3 is satisfied, too, i. e., the feasible set with respect 
to the continuous variables is a compact set because of (2.1 ). Therefore, the 
necessary properties for a nonsmooth optimization method of the kind dis- 
cussed in Section 3 are at hand provided that efficient algorithms for solving 
the subproblems are available. Consequently, we shall have established an 
algorithm for solving the problem (2.1 ) - (2.9 ) if the following points are 
clarified: 

• approximation of the stochastic process by a scenario tree; 

• choice of an appropriate method for solving the dual problem (4.4 ); 

• efficient algorithms for solving the subproblems Pi{') , Pj{'), 

• gaining information from the solution of the dual problem (4.4 ) for 
computing a primal feasible solution (Lagrangian heuristics) and pro- 
viding an estimation of the occurring relative duality gap. 

Let us comment on all of these points. The stochastic process (P can be 
approximated by means of an analysis of statistical data using also expert 
knowledge. The first thing to clarify is the nature of the demand random- 
ness. In order to estimate the load of the system one usually uses the data of 
the same week from previous years, data of days with similar weather condi- 
tions, and the experience of experts. The strategy of creating scenarios has 
to reflect truly all possible future demands. The number of scenarios that 
approximate the demand has to be chosen in such a way that a fairly good 
approximation is obtained but the speed of the optimization procedure is not 
affected critically since the execution time of the algorithm grows rapidly as 
the number of scenarios included increases. The probability assigned to each 
scenario can be calculated according to the likelihood of its occurrence. 

The functions PC a and SC a i = 1, • • • , /, t = 1, . . . , T are assumed to be 
piecewise linear or quadratic. Consequently, the function 0(A) is piecewise 
twice continuously differentiable. Therefore, any method of non-smooth op- 
timization of those discussed in the previous section could be applied. The 
methods developed as bundle methods of order higher than one could be ap- 
plied successfully, e. g. [36], [42], [44]. Unfortunately, for those guaranteeing 
superlinear convergence ([45]), no computational code is available up to now. 
The variable metric bundle methods [36], [42], [44] provide convergence but 
no estimate of the rate is given. We would like to emphasize that those meth- 
ods are finite for piecewise linear convex functions. The published experience 
with NOA Version 3.0 ([37]) reports fast convergence in practice (cf. [36]). 
The efficiency of the optimization algorithm depends to great extent on the 
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fast computation of the values and subgradients of the objective function 
0(A). Therefore, the development of fast algorithms for solving the problems 
Pi{X) and Pj(A), i = 1, . . . , j = 1, . . . , J, is important. An algorithm 
for solving the problems -P/(A), j = 1, . . . , J, has been developed in [48]. It 
regards Pj{X) as a network-flow problem and suggests a procedure adapted 
to the structure called EXCHA. The crucial point in this procedure is the 
selection of a proper direction from a prescribed subset of descent direc- 
tions for minimizing the objective of Pj{X). Let us consider the problems 
Pi (A), z = 1, . . . , /. The inner minimization (with respect to pi) can be done 
explicitly or by one-dimensional optimization. Further, a dynamic program- 
ming procedure can be used to minimize the expected costs with respect to 
the integer variables Ui. A state transition graph of the unit to each scenario 
regarding the nonanticipativity constraint can be considered. Then the so- 
lution corresponds to a tree in this graph that has minimal weighted length. 
In order to reduce the number of nodes, we can include the constraints (2.3 ) 
into the process of generating the state transition graph by setting nodes 
’’off” for at least ti periods. 

Another substantial part of the solution procedure for the dynamic recourse 
problem consists in developing an algorithm for the determination of a pri- 
mal feasible solution after one has found a solution of the dual problem. As 
already established, if we And an ’’almost” feasible point, it is ’’almost” a 
solution (Proposition 3.1). In addition, the optimal value 0(A) of the dual 
problem is a better lower bound of the objective function of the primal prob- 
lem than the value of its continuous relaxation. It is possible to use some 
modification of the heuristic procedure presented for this purpose in [70] and 
further modified as in [14]. Recent publications [19], [41] suggest heuris- 
tics based on relaxed convexified primal problems These procedures are not 
directly applicable to the problem considered here due to the presence of 
pumped-storage hydro plants. An adaptation of these ideas to our setting 
needs further investigation. 

In our case, the Lagrangian heuristics could work as follows: 

• try to satisfy the reserve-constraints by using pumped-storage hydro 

plants in those time intervals, where the largest values of occur. 

If the reserve-constraints are still violated, use the procedure of [70]. 

• improve the feasible solution found at the end of the procedure above by 
solving the problem keeping the integer variables fixed. An algorithm 
for the latter problem is suggested in [49] that is a modification of 
the network-flow algorithm in [48]. The problem is considered as a 
network-flow problem again and the algorithm makes use of its special 
structure. 

Summarizing, the presented solution technique includes the following basic 
steps: 

• generation of a scenario tree (discrete approximation of d) 
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• solving the problem (4.4 ) e.g. by NOA Version 3.0, 

solving the problems Pi(A), t = 1, . . . , by dynamic programming and 
P,(A), j = l,...,J,by EXCHA. 

• determination of a primal feasible solution by the procedure described 
above. 

An illustrative example for an approximation of the load is given in Figure 3 
and Figure 4 expresses the corresponding stochastic schedule for fixed binary 
variables. The values of the approximative load are generated by using the 
value of a given load, and a standard normal random variable (see [49] for 
details). 

A final remark is due. There is an estimate for the occurring duality gap. We 
use the description of the problem (2.11 )-(2.13 ) based on scenarios. At this 
place, we incorporate the nonanticipativity condition into the representation 
of the model. More precisely, we consider decisions s\n->w\n) 

and x\f^ that correspond to scenarios n and h fulfilling for all t = 

1, . . . , tfc as indistinguishable up to the stage k. We use only one notation for 
the decisions at stage k for all scenarios that are indistinguishable up to that 
stage. Recall that the number of scenarios at the stage k is denoted by rik and 

K 

the number of load and reserve-constraints amounts to 2 ^ rik{tk-\-i -tk)> 

k=l 
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Fig. 4: Solution for the load given in Figure 3 



Proposition 4.2 Assume relatively complete recourse for the dynamic re- 
course problem. Let its optimal value be denoted by F* and the optimal value 
of its dual problem by 0*. Then there exists a constant p such that the fol- 
lowing estimate holds true: 

K 

F* - 0* < (2 ^2 '^k -tk)-\- l)p 

k=l 

Proof: The proof follows from Proposition 5.26 in [2]. We only have to 
show that the assumptions (A1)-(A3) made there are satisfied in our situa- 
tion. (Al) is just the feasibility of the problem, which holds due to relatively 
complete recourse. (A2) and (A3) are easily checked specifying the required 
conditions. □ 

We consider the same dynamic recourse problem with a modified objective 
function: 

/ T 

[PCiM, u^d + SCu 

i=l t=l 

The objective function in this case represents the average costs per scenario- 
term. We have the same optimal solution for both problems and the duality 
gap becomes 

2 Yh “ ^A;) "1" 1 

k=i 



F* -Q* < 



IN 



P 
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The latter inequality implies that the duality gap goes to zero as J ^ oo. 
Consequently, the duality gap becomes small for large systems independently 
of making the discrete approximation of the load finer {N -4 oo). 



5 Lagrangian Relaxation for the Two-Stage 
Model 

We consider the two-stage stochastic power production planning model elab- 
orated in Section 2.3 under the assumption that the fuel cost functions 
exhibit the form (2.7 ). Setting x := {u,p,s,w) and x := {u, p, s, w) G 
A, p\ M^) the optimization problem consists in minimizing the ob- 
jective function 



F{x, x) := IE 



[FCu {p\ + p\, + SCit (fii(O)] 

i=l t=l 



(5.1) 



over all decisions x G l?'"and x € L°°(n, A, p,, such that the unit 
capacity limits (2.1 ), (2.2 ), (2.16 ), (2.17 ) the minimum down-time con- 
straints (2.15 ) and the loading and reserve constraints 



i=l j=l 

E (p\+Pi) + E ~ K- + ^ 

i=l j=l 



respectively, are satisfied. The constraints (5.2 ) are coupling across units 
while all remaining constraints are associated with the operation of sin- 
gle (thermal or hydro) units. With a similar argument based on a duality 
statement as in the previous section, we relax the constraints (5.2 ) by in- 
troducing Lagrange multipliers A = (Ai, A2, A3), where Ai,As € and 
A2 G L^(fl, .4, /i; IR^). The dual problem is then of the following form: 

max {0(A) : A G x L^{Q, A, p', RF) x R'^ , A > 0, — a. s.} (5.3 ) 

where 



0(A) := inf {L{x, x; X) : x and x satisfy (2.1 ), (2.2 ) and (2.15 ) - (2.17 ) } , 
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L(x, X] A) := 



F{x, + 



E Pi - E (Si - ■u’j) 



i=l 



j=l 



+E 



T 

EA| 

t=l 



- E (Pi +Pi) - E («i + 

i=l j=l 



«ij)) 



T 

+ EA| 



^=1 



i=l 




= EE[E{FCu(p\+plut) + SCu(ui{t))-Xm+pm 

i=l t=l 

-(A‘i - X\)pI - A|wfpS“] 

-EE [Ai(s5 - wj) + ]E {X^ (sj + sj - (w*. + wj)) }] 

j=l t=l 



T 



+ E 

t-1 






Hence, the dual function 0 decomposes into the form 

I J T 

0(A) = ^ 0i(A) + W + E [A‘iiE^(d‘) + + A‘r‘] . (5.4 ) 

i=l j=l t=l 

Here 0i(A) is the optimal value of a two-stage stochastic program for the 
(single) thermal unit i, which has the form: 



min I E [FCit {pl+pi «|) - {p\ + p\) + SCu {ui{t)) 

- (AJ - A|)p? - A‘^xbi?“]} : p7^^ <p\+pI< 

P™‘"^i ^ Pi ^ and minimum down-times (2.15 )} 



(5.5) 



Introducing the optimal value function for the second-stage problem and 
taking into account the special form (2.7 ) of the fuel costs, the two-stage 
mixed-integer stochastic program (5.5 ) may be rewritten as 



min { E [(4 - Ai ) p 1 - A‘u^p;?“] -H E {$i(ui,Pi; A2)} : 

(5.6 ) 

pT4 <p\< p^^u\, t = 1, . . . , T, and (2.3 ) } , 
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where 

^iiui,pi]\ 2 ) := inf I max^/a(pf +p\) - A|(pf +pf)+ 

SCu{iiiit)) + Ciuf] : pf-u\ <p\+p\< P^‘^u% 
t = l,...,T, and (2.15 ) }. 

Since the minimization with respect to pi and pi {p- a. s.) in (5.5 ) or 
(5.6 ) can be performed explicitly, the models represent two-stage stochastic 
combinatorial programs and can be solved by dynamic stochastic program- 
ming. Problem (5.6 ) simplifies essentially for the case of I 2 = 0, i- e., 
u\=u\ (i = 1, . . . , t = 1, . . . , T), because the compensation program does 
not contain binary decisions, enjoys a separability structure and can be 
computed explicitly. In the latter case (5.6 ) takes the form 

min { Z + (A* - X\)p\ - (A*p^- - a)ul] + 

^ t=l 

E Z ht{u\,ph A|) : P^'M <P\< pT^u\, f = 1, . . . , T, and (2.3 )}, 

t=i ^ 

where 4it(Wi,P*; A|) := inf { max fu(p\ +p\) - Xl(p\ +p\) : 

P^^''u\<pI+p\<pT^uI }. 

The term 0j(A) in the representation (5.4 ) of the dual function 0 is the 
optimal value of the following stochastic pumped-storage subproblem for the 
plant j : 

min {-E(Ai+iE;(A*))(s5-i/;p+iB[EA|(s5-n)‘.) : 

I t=l lt=l J 

{sj,Wj) and {sj,Wj) satisfy 0 < 0 <w^j < ) 

t = and (2.2), (2.16)}. 

Problem (5.7 ) represents a linear two-stage stochastic program, which can 
be solved by standard solution techniques (cf. [17], [20]). 

These facts motivate a Lagrangian relaxation-based conceptual solution me- 
thod for the two-stage stochastic model, which is similar to the algorithm 
developed in the previous section. Its basic steps are: 

• Generation of scenarios dn,n = 1,...,A^, for the load process d and 
replacing d by this discrete approximation; 

• solving the concave dual problem (5.3 ) by applying appropriate nondif- 
ferentiable optimization methods (cf. Section 3), where function values 
and subgradients of 0 are computed by solving the single unit subprob- 
lems (5.5 ) and (5.7 ). Note that (5.3 ) has dimension 2TN and 0 is 
piecewise linear or quadratic. 
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• determining a primal feasible solution for the first-stage variables by a 
procedure that is similar to the method described in Section 4. 
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Abstract. If a robot has to perform a specified manipulation task involving 
intentional environmental contacts, a certain response behavior is desired to 
reduce strains and ensure successful completion without damage of the con- 
tacting bodies. On the other hand, the dynamic behavior of a manipulator 
depends strongly on its position and the gains of its joint controllers. Hence, 
varying these parameters for an optimized performance during manipulation 
seems to be an obvious task. In order to deal with impacts, oscillations and 
constrained motion, a model-based optimization approach is suggested, which 
relies on a detailled dynamic model of the manipulator incorporating finite 
gear stiffnesses and damping. These models are used to define an optimiza- 
tion problem, which is then solved using numerical programming methods. It 
is illustrated with an assembly task, namely inserting a rigid peg into a hole 
with a PUMA 562 manipulator. The expected advantage in industrial ap- 
plications is a comparatively easy implementation, because performance can 
be improved by simply adjusting ’external’ parameters as mating position 
and coefficients of the standard joint controller. Particularly, no modifica- 
tions of the control architecture and no additional hardware are required. 
Application of the proposed approach to a rigid peg-in-hole insertion under 
practical constraints can reduce the measure for impact sensitivity by 17 %, 
that for mating tolerances by 78 % and the damping of end-effector oscilla- 
tions and motor torques by up to 79 %. These improvements are shown to 
be reproducable experimentally. 



1 Introduction 



A great deal of work has been done in recent years for enabling robots to per- 
form complex manipulation tasks, where manipulation means that the robot 
interacts mechanically with its environment. High precision demands and the 
lack of sensoric capabilities, but also the deficiency of realistic models in the 
planning stage have to some extent prevented the automation of many tech- 
nical applications such as assembly, grinding, burring or surface polishing. 
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In addition, handling devices specially designed for one single task are often 
too costly. Thus, programmable multipurpose robots are used to meet the 
increased demand for flexibility and the question arises how such devices can 
be efficiently programmed to perform a task ’optimally’. Several approaches 
have been developed in order to tune a robot to the specific properties of a 
task, especially with regard to environmental interaction, see among others 
[6, 16]. 

Elaborate control schemes have been implemented including nonlinear con- 
trollers, adaptive and robust control [7, 19], as well as hierarchical structures 
incorporating elaborate force control strategies, [14, 17]. There are promis- 
ing approaches among them, which allow to perform a previously planned 
task fast and reliably, mainly because of their ability to detect possible errors 
during the task and take action to correct it. 

However, in the planning stage the occurence of such errors can to a certain 
extent be avoided by the use of model-based analysis and optimization tools. 
The free parameters, which can be adjusted for this purpose are position and 
trajectory of the robot, its control coefficients and some design properties of 
the parts to be assembled. Depending on the properties of the task the effects 
of disturbances and finite tolerances can be predicted and systematically 
optimized. 

Several measures have been developed for the judgement of the ability of 
a robot for manipulation, [1, 2, 9, 20]. In [20] a manipulability measure 
is defined, which gives an indication of how far the manipulator is from 
singularities and thus able to move and exert forces uniformly in all cartesian 
directions. These considerations were taken up in [2], where ellipsoids for 
force transmission, manipulability and impact magnitude are defined and 
visualized for a planar 4-DOF arm and a PUMA 560 robot. Asada [1] refers 
to the same effect as the virtual mass and pursues a concept, where the 
centroid of the end effector and its virtual mass are interactively optimized 
to achieve a desired dynamic behaviour. 

To overcome problems stemming from uncertainties in the relative position 
of the mating parts to each other, Pfeiffer [9] uses a quasistatic force equilib- 
rium between the robot’s end effector and the environment for the different 
possible contact configurations. Tolerance areas are calculated, which show 
the permissible deviation from the ideal path to ensure successful completion, 
depending on the cartesian stiffness of the robot’s tip. 

Assembly strategies incorporating the process emerged first from purely ge- 
ometric analyses, [3]. Such process investigations were extended to quasistatic 
investigations of the contact forces for different configurations, [15, 18] and 
interactions between process and robot were taken into account, [17]. With 
the ability of modeling structure-variant multibody systems incorporating 
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unilateral contacts, complete dynamic models of robot and process including 
all interactions are available, [8, 9, 13]. These detailed models can be used 
to study the behavior of manipulators in connection with all effects emerging 
from environmental interaction, such as impacts, friction, possible stiction 
and constrained motion, see [17]. Therefrom, criteria and constraints have 
been evaluated to form an optimization problem for the automatic synthesis 
of a robotic assembly cell, [11]. It was soon clear that in optimizing manipu- 
lation tasks the problems shifted from difficulties in building realistic process 
models to computational expense and numerical convergence. Therefore, in 
the proposed approach computationally expensive process models are not 
part of the underlying model. They are rather contained in the formulation 
of the criteria by weighting factors for the different cartesian directions. 



2 System Model and Problem Description 

A typical feature in robotic manipulation tasks is the change in contact con- 
figuration between the corresponding workpieces. The resulting forces and 
moments acting on the end-effector influence the motion of the manipulator 
during the task. When modelling such processes, we can distinguish between 
the dynamic model of the robot and that of the process dynamics. 

Industrial robots suitable for complex assembly tasks have to provide at 
least six degrees of freedom and - to ensure flexible operation - a large 
workspace. We will therefore focus on manipulators with 6 rigid links and 
6 revolute joints, which are very common in industry. Such a robot can be 
modelled as a tree-structured multibody system. Fig. 1. 

jjAf.3 



G 





Fig. 1: Dynamic robot model 
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The joints of the first three axes are considered elastic in order to take the 
finite gear stiffnesses into account, which play an important role in precision 
assembly. For this purpose a linear force law consisting of a spring-damper 
combination combined with the gear ratio j = is as- 

sumed. The gears of the hand axes are considered stiff and the motion of one 
arm and its corresponding motor is kinematically coupled.^ 

According to this, the robot possesses 9 degrees of freedom, 6 arm angles 
and 3 free motor angles connected with the respective joints by a linear force 
law. 



Taj — Cj 




j = (1) 



where jmjHaj denote the angle of the j-th. motor and arm, respectively, 
relative to the previous body, see Fig. 1. cj and dj are stiffness and damping 
factors of the j-th gear and icj is the gear ratio. 

Thus, for the vector of generalized coordinates 

q '= [7m,i,7m,2,7m,3,7a,i, • • - ,7a,6]^ (2) 



the equations of motion for the robot with forces acting on the gripper can 
be written as 



M(q)q + f(q, q) = Btc + W{q)X 



( 3 ) 



with M being the inertia matrix, B and W are the input matrices for the 
motor torques (tc) and gripper forces (A), respectively. /(g,g) is a vector 
containing the gravitational and centrifugal forces. Let us assume the robot 
to be controlled by six PD joint controllers, one for each joint, which are 
represented by 



'^C — — ■K’p {Qm ~ QMd) ~ {Qm “ QMd) > (4) 

where 



K, 


- diag[ifp,i 


Kpfi] 


Ki 


= diag [Ka,i 


, • • • . Kd,6] 


QMd 


= [iMld, ■ ■ ■ 


, '^M&d\ , 


Qm 


= B^q 





^In reality, the geeirs of the hand axes are elastic as well. However, the masses of the 
wrist bodies ^lre comparatively small smd thus the associated natural frequencies are out 
of the range of interest for ovu* purposes. 
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domain 


robot response behavior 


criteria 


impact sensitivity 
maximal robot force 
mating tolerance 
vibrational behavior 


constraints 


robot design: 
joint angle limitations 
joint torque limitations 
robot control: 
controller stability 
singularities 

practical demands in cell: 

workspace restrictions 



Table 1: Optimization concept 



with Kpj , Kdj being the stiffness and damping control coefficients of the j-th 
axis referring to motor angles cts inputs and motor torques as outputs. jMjd 
is the motor angle of the j-th motor desired for a given position. 

Supposing that the length of a trajectory for mating two parts together 
is small compared to the robot’s characteristic mecisures, eq. (3) can be lin- 
earized around a working point q = Qq q, Qq = Qq = 0, which yields 



M{qo)q + P{qo,Kd)q + Q{qQ,Kp)q = 

= Hqo) + BKpqm^ + BKdilMd + W{<1o)Hq, 4f) , 



P{qo,Kd) = P{q^) + BKiB'^ 
Q{qo,Kp) = Q{q^) + BKpB^ 



with damping matrix P{qo) and stiffness matrix Q(gfo)* ^(Ofo) contains only 
the gravitational forces. 

With this model the parameters which affect the natural mode of vibra- 
tions of the manipulator, namely its position q^ and control coefficients Kp 
and Kd^ are considered, which - in mathematical terms - give an optimal 
linearization point for the equations of motion (5). The goal is to find q^^ 
Kp and Kd such that the system described by (5) behaves optimally with 
respect to the criteria relevant for a specific process, q^^ Kp and Kd can be 
optimized with only rough knowledge of the process to be carried out. The 
resulting optimization problem is described in table 1. 
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Optimization of robotic manipulation processes, particularly assembly, is 
essentially a trade-off between different, sometimes contradictory, aims. Op- 
timizing for reduced sensitivity against gripper impacts, for example, may 
deteriorate the behavior with respect to the maximal applicable gripper force 
and vice versa. Thus, a set of criteria is established, with which a vector op- 
timization problem can be stated. The specific needs of different processes 
are taken into account by the correct choice of criteria and weighting factors. 
For this purpose, the functional-efficient set of solutions is calculated, from 
which an optimal trade-off between the criteria can be chosen. 



3 Criteria and Constraints 



The effects, which influence the robot’s behaviour during an assembly task 
have been worked out by a quasistatic and dynamic analysis of the robot 
dynamics in conjunction with a detailed modelling of different mating pro- 
cesses, see [8, 9, 13]. Scalar optimization criteria are derived from them, the 
minimization of which yields an improvement of the system’s performance 
with regard to the respective effect. 



3.1 Optimization Criteria 



Impact sensitivity: When the mating parts are getting in contact with 
each other, impacts are unavoidable. However, their intensity is proportional 
to the effective mass rrired ~ M^^w\ , reduced to the end effector, 

where w is the projection of the impact direction into the generalized co- 
ordinates and depends on the robot’s position as well as on the cartesian 
impact direction. Fig. 2 shows an ellipsoid at the robot’s gripper, from which 




Fig. 2: Impact sensitivity in different cartesian directions 

the reduced end-point inertia for each cartesian direction can be seen. In 
order to reduce the impact sensitivity, the volume of that ellipsoid must be 
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minimized. Thus, we define the reduced endpoint inertia matrix Mred as 



^^red — 



Jtg(9o) 

jRaiqo) 



M(go)-' 



Jraigo) 

jRaiqo) 



-1 



( 6 ) 



where Jtg{Qo) and Jrg{Qo) are the gripper’s Jacobians of translation and 
rotation with respect to a coordinate frame fixed at the gripper. Depending 
on the specific needs of the mating process, a 6 x 6 diagonal positive semidef- 
inite matrix of weighting factors is introduced for the trade-off between 
the cartesian directions. In Qm directions, in which impacts will occur 
during manipulation, can be emphsisized. Thus, geometrically, the ellipsoid 
will be sqeezed or rotated during an optimization from directions, in which 
the mating process considered is sensitive against impacts into directions, in 
which impacts are not likely to occur. Therefore, as an optimization criterion 
for the minimization of impact intensities in the sensitive directions 

(7) 



is stated, with ||A|| = y trace being the Frobenius-norm of A. 

Maximal applicable mating force Xmax in the direction of insertion: 
The upper bound for the applicable mating force A is defined by the maximum 
torque of each motor multiplied by the resulting lever arms. Xmax can thus 
be evaluated from 





r 


^ h,’(qo) '^i,max ' 




min < 
i 


min < 








2=1 


, . . .,riM , 


j 



( 8 ) 



where um is the number of driven axes and n is a unit vector denoting the 
cartesian insertion direction. ft»(go) is the torque necessary at joint i to 
balance the gravitational forces and is equal to the i-th component of /i(go)- 
For a maximization of Xmax ? its inverse is taken as the second criterion 

= ( 9 ) 

^max 

Mating tolerance: The deviation Axq from the desired path for a given 
static force depends on the endpoint stiffness, reduced to the cartesian gripper 
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Fig. 3: Force equilibrium for mating tolerance 



coordinates, [9], see Fig. 3. A queisistatical force equilibrium at the gripper 
yields 



Axg 



Qred 



= Qre\^ + > 



Jtg 

Jrg 



Q 



Jtg 

Jrg 



-1 



( 10 ) 



Axp is the deviation resulting from the clearance between the two parts 
and from the stiffness of the parts themselves. Therefore it depends only 
on the mating process itself and needs not to be considered here. For a 
maximization of Axq the reduced stiffnesses Q^ed lateral directions 

must be minimized. Together with a weighting factor Qq, which contains 
the cartesian directions, in which the tolerances are critical, this forms the 
criterion for the maximization of the mating tolerance 



Gs=\\gQQreA ■ ( 11 ) 

It should be noted that, for translational deviations, mainly the directions 
perpendicular to the insertion direction should be emphasized by Qq and all 
rotational directions can possibly be contained, whereas the cartesian stiffness 
in the insertion direction does not contribute to the mating tolerance and 
should be high for a reduced path deviation. 

Disturbance and tracking properties: When being excited by distur- 
bances (e.g. by impacts), the gripper performs oscillations, the amplitude 
and damping of which depend strongly on the robot’s position and joint con- 
troller. On the other hand, a desired force or motion must be transmitted 
to the end effector as directly cis possible. All this must be performed with 
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as little expenditure of energy as possible. To meet these requirements, the 
equations of motion (5) are evaluated by time simulations for certain test 
inputs and three integral criteria are formulated: 

(1) Damping of force induced gripper oscillations: 



G4 = J x^{t)gsx{t)dt, ( 12 ) 

0 



*^tg(9o) 

[ JRa{qo) 
a unit impulse input to (5 



where a;(^) = 



q{t) and A(/) = [0,0, 6{t) ,0,0, 0]^ represents 

in the direction of insertion, in (12) gives the 
trade-off between end-effector oscillations in the different cartesian directions. 



(2) Transmission of a desired force to the end-effector: 



oo 

Gs = j{X- Xdf 9f (X - Ad) dt , (13) 

0 

where rc,d{i) = contains the motor torques needed to 

exert the desired end effector forces. W is the projection from working space 
into configuration space and Iq denotes the matrix of gear ratios. As a test 
signal Xd{t) = [0, 0, cr{t), 0, 0, 0]^ is used, where a{t) is the unit step function. 



(3) Joint torques: A perfect damping of gripper oscillations and perfect 
tracking properties would require infinite joint torques. Thus, as soon as 
control coefficients are being optimized, the necessary torques must be con- 
sidered. The performance criterion to be minimized is 



Ge = / dt (14) 

0 

with the same disturbance as in (12) and tc being the joint torques from 
eq. (4). 



The above list of optimization critria is of course not a complete list of 
possible objectives for robotic optimization. However, for a large class of 
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manipulation tasks, a combination can be found suitable for the specific 
properties of the process. For any process the cartesian weighting factors 
9m ^ 9q^ 9s i 9 f 9t chosen from physical evidence: e. g. the 

impact intensities during insertion of a rigid peg into a hole will be worst in 
the cartesian ZG-direction. Thus, a large weight must be imposed on it in 
9 m 

3.2 Constraints 

In order to obtain sensible results, which can be utilized in practice, certain 
constraints have to be imposed on the optimization problem. The highly 
nonlinear programming problem defined by (7) to (14) shows good conver- 
gence only when it is ” properly” constrained, i. e. that the parameters are 
restricted to an area, where a minimum of the criteria can be reliably found. 
The constraints are in detail: 



• The linearized equations of motion (5), 



• Joint angle limitations: 




Qmin ^ — 9max ) 


(15) 


• Joint torque limitations: 




C,min ^ C ^ C,max • 


(16) 



Joint angle and joint torque limitations for our example are chosen 
according to those of the PUMA 562 robot. 

• Stability of the controller used: As soon as control coefficients 
are being optimized, stability of the resulting system must be assured 
by suitable constraints. Since the robot dynamics has a linear time- 
invariant characteristics, the eigenvalues of the dynamic system matrix 
derived from eq. (5) are calculated and their real parts are restricted to 
be negative. 

• The proximity to singularities must be avoided. In such positions, 
the robot would not be able to move in the desired manner and the 
obtained results would be without any practical relevance. Further- 
more, some of the optimization criteria tend to infinity at singular po- 
sitions. Thus, punching out finite regions around them would improve 
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the condition of the optimization problem. As a measure the condi- 
tion number k of the end effector’s Jacobian is used, which is defined 
by k{A) = ||A||||A'”^|| and tends to infinity as the Jacobian becomes 
singular. 



/ Jraigo) ^ 
VL Jrg{Qo) \) 



< e. 



(18) 



In the example e is chosen £: = 20. 

♦ In any industrial application, the position and orientation of the 
gripper are restricted by external constraints, such as obstacles within 
the working space, or the requirement that the parts should be assem- 
bled on a workbench with a given height. Position and orientation are 
calculated using the robot’s forward kinematics, so that geometrical 
constraints can be stated in Cartesian space. For simplicity in our ex- 
ample, we restrict the robot’s position to a cube, the edges of which 
are parallel to the base coordinate frame B of the robot, see Fig. 4: 




Fig. 4: Working space restrictions 



B^min '^g{Qq) ^B max • (19) 

Orientation restrictions are expressed using the rotational gripper 
transform Aqb- In the example, we choose the orientation to be re- 
stricted such that the zq direction should have a negative component 
in each the xb- and the z^-direction, which means that the mating 
direction points downwards away from the robot’s base. 
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3.3 Example: Rectangular Peg-in-Hole Insertion 

We illustrate the method with the position controlled insertion of a rigid 
rectangular peg into a hole using a PUMA 562 manipulator, starting from 
the reference configuration Qo^ref^ ^p,refi Kd,ref defined in eq. (20). This 
configuration is characterized by short effective lever arms that disturbances 
can work on, and small control coefficients, which increase gripper compliance 
for improved mating tolerance: 



Qo,rej = [ 2 -152 -4 0 -19 179 ]° 

Kp^rej = [ 1.604 1.304 2.608 0.395 0.556 0.390 ] ^ (20) 

Kd,reS = [ 0.055 0.013 0.019 0.00263 0.00280 0.00195 ] ^ 



Rigid peg-in-hole insertion is mainly characterized by rigid body contacts, the 
occurrence of which can not be predicted because of the limited positioning 
accuracy of the gripper. Thus, peg and hole will show lateral and angular 
offset between each other. This causes impacts between the peg and the 
chamfer, which result in gripper oscillations. On the other hand, compliance 
in the lateral directions is required in order to compensate for positioning er- 
rors. Therefore, for a rectangular peg-in-hole process, the criteria for impact 
sensitivity, mating tolerance and damping of gripper oscillations are the most 
relevant ones. For our case study we chose the vector of objective functions 



G = 



^ 1,0 

Cj3,0 

GA±Ge 

(? 4,0 + (? 6,0 






( 21 ) 



normalized to the objective function values Gi^o^i = 1,3, 4, 6 of the refer- 
ence configuration ^fo.re/ j ^p,ref and Kd^ref • Let us for an appropriate choice 
of cartesian weighting factors assume the lateral clearance in a?-direction be- 
tween the two parts to be smaller than the robot’s positioning accuracy. In 
y-direction the clearance is assumed to be large enough to avoid contact 
with chamfers. Thus, impacts will occur mainly in x- and z-direction and 
optimizing for impact intensities means to find a position, where the effec- 
tive end-effector masses in x and z are minimized. The mating tolerance 
for this process is determined by a:-translational and ^y-rotational cartesian 
stiffnesses. Also the vibration behavior is most critical in x- and z-direction 
and the weights for the motor torques are chosen according to the maximum 
motor torques of the PUMA 562 robot. Thus, the cartesian weights for our 
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example problem write 
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A sensitivity analysis was carried out, from which can conclude that the 
considered objectives are in accordance with physical evidence. Hence they 
reflect the physical behavior of the system in the sense that they become a 
minimum at the locations, where performance is best. On the other hand, 
they are highly nonlinear functions of the optimization parameters. This 
forms a nonlinear, nonconvex optimization problem. Moreover, some of the 
cost functions tend to infinity at singularities showing very high curvatures. 
Thus, for an efficient optimization, analytical derivatives of the objective 
functions, for which a calculation is possible at a reasonable cost may sig- 
nificantly improve convergence. This is done for Gi, G 2 and G 3 and for the 
singularity (18) and working space constraints (19) using analytical calcu- 
lation software. The objective functions possess local minima, in which the 
optimization routine may converge, so that an optimization of the position 
makes sense only, if the problem is constrained to a certain region within 
the working space. However, in most cases practical considerations in a real 
environment yield working space restrictions anyway. 



4 Vector Optimization Problem 



The above mentioned criteria and constraints form a nonlinear vector problem 
for the position/controller optimization. Thereby the manipulation task to 
be carried out is charactarized by a specific combination of cost functions 
and weighting factors, which can be chosen by physical evidence, as shown 
before. Thus, the complete vector problem for our example writes 

min {G : /i = 0; /a < 0} (23) 

Qq}^ d 
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with fi(qQ^Kp^Kd) stemming from (5) and f 2 {qoiKpjKd) being 

joint angles 
joint torques 
controller stability 

(24) 

avoid singularities 
workspace 
mating direction 

It is known from the theory of vector optimization that (23) can not be 
uniquely solved, if any of the components of G are competing, [4]. Rather, 
the solution of (23) is a subspace of the parameter space of dimension 
and denotes the Pareto-optimal set of all possible solutions of fi = 0) f2 ^ 
0. Pareto-optimality is reached if none of the objective functions can be 
improved without deteriorating at least one of the other criteria. Using the 
method of objective weighting [5], a substitute problem is stated with a scalar 
preference function P to be minimized. For this, a vector of weighting factors 
w = [wiWsW 4 e] G IR”® is introduced such that 

0 < < 1 ; wi = l ] 

* = 1 , 3,46 (25) 

P{G{qQ,Kp,Kd),w) = wG{qQ,Kp,Kd). 

The Pareto-optimal set of solutions is then obtained by solving the scalar 
substitute problem 

min {P -fi = 0; < 0} (26) 

for each vector in, which fulfills (25). Eq. (26) is solved for a systematic 
variation of w using a Sequential Quadratic Programming algorithm with the 
Hessian matrix of the Lagrangian function being updated at each iteration 
by a qucisi-Newton approximation (BFGS), [10, 12]. 

5 Optimization Results 

In the following considerations, the reference configuration from (20), which 
is already considered suitable for the regarded process, is used as starting 





71 



point for the optimization. It is then compared to an ’optimal trade-off’ 
configuration, which is chosen from the set of Pareto-optimal solutions. As 
shown in Fig. 5, the single cost-functions Gi can be considerably diminished 
with respect to the reference configuration if they are emphasized in the 
preference function P. The criterion for impact intensities can be reduced 



impact sensitivity 





Fig. 5: Pareto-optimal set of solutions 



by at most 17 %, that for mating tolerances by 78 % and the damping of 
end-effector oscillations and motor torques by up to 79 % with respect to the 
reference point. However, it is evident that such tremendous improvements 
cause deteriorations in other criteria. For example, optimizing for damping of 
oscillations only deteriorates the impact sensitivity by 72 % and optimization 
for mating tolerance only increases the oscillation criterion by 136 %. But 
there are also regions within the Pareto-optimal area, where all criteria are 
improved with respect to the reference configuration. Fig. 5 shows that over 
a wide range of possible weighting factors the criteria for impact sensitivity 
and for mating tolerance are not contradictory to each other: A simultaneous 
improvement of both cost functions can be observed for a large number of 
possible weights. Only if G\ is strongly weighted in P, Gs becomes worse. 
On the other hand, G 46 is found to impose completely different demands on 
the position and on the controller. G\ and G 3 have their largest value at the 
point, where G 46 is minimized. 

From these considerations it is clear that an ’optimal trade-off’ must be 
found, which gives a satisfactory improvement in each of the criteria. The 
process of finding this optimal trade-off can hardly be formalized in a math- 
ematical sense, because the trade-off between the cost-function weights is 
generally governed by criteria, which require human expertise. Thus, the 
Pareto-optimal region in Fig. 5 has to be judged in order to find an optimal 
solution. In our peg-in-hole example tn = [ 0.6 0 0.4 ] is chosen, which 
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yields the following configuration: 

9o,op* = [ 3.3 -165.9 1.1 92.3 -39.0 72.5 ]° 

Kp^opt = [ 1.226 1.219 1.000 0.132 0.139 0.133 ] ^ 

Kd,opt = [ 0.0715 0.0331 0.0145 0.0016 0.0024 0.0074 ] ^ 

Gopt = [ 1.24 0.36 0.33 



The position resulting from 9o,opt is depicted in Fig. 6, compared to the 
reference position 9o,re/- 

reference position optimized position 





Fig. 6: Robot position for ’optimal trade-off’ and reference 
configuration 



reference configuration 




optimized configuration 




Fig. 7: Impact sensitivity ellipsoid for ’optimal trade-off’ 
compared to reference configuration 



It can be seen from Gopt in (27) that significant improvements in G 3 and 
G 46 can be expected, which must be paid for by a slight deterioration in Gi. 
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Fig. 8: Tolerance area for \max — lOAT for ’optimal trade- 
off’ compared to reference configuration 




Fig. 9: Transfer function / = for ’optimal trade-off’ 
compared to reference configuration 



This can be fully comprehended in Figs. 7 to 9. The volume of the impact 
ellipsoid has slightly increased and the main axis is rotated with respect to 
the y-dixis by a small angle. The tolerance area for a given maximal mating 
force of Xmax = lOiV, calculated from a quasistatic force equilibrium, [9], is 
significantly enlarged, Fig. 8. For the judgement of the disturbance behavior, 
the amplitude frequency response function for zq gripper displacement re- 
lated to zg gripper force is depicted in Fig. 9. The resonance peak at the first 
natural frequency vanishes completely, which indicates gripper oscillations to 
be well damped. However, the starting amplitude is increased. This is due to 
the trade-off with the mating tolerance criterion, which reduces the cartesian 
end-effector stiffness. 
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6 Experimental Vert ificat ion 



The optimized configuration is verified experimentally using the test-setup 
depicted in Fig. 10. A rigid rectangular peg is assembled into a rigid hole 




Fig. 10: Experimental test-setup 

with a clearance of 0.3 mm in a?G-direction and an xq offset of 2 mm using 
a PUMA 562 manipulator. The desired mating path is a straight line in 
ZG-direction with a length of 60 mm and an assembly time of 0.4 s. Forces 
are measured using a Schunk FTS 330/30 force-torque sensor installed at the 
robot’s end-effector. The gripper position is reconstructed by measuring the 
joint encoder angles using the robot’s forward kinematics. 

For the judgement of impacts, the time histories of the zq gripper force is 
considered, Fig. 11. Significantly, a force peak occurs at the time where the 
two parts are getting in contact for the first time, the height of which gives a 
measure for the impact intensity. Since the relative velocity, with which the 
parts meet, is almost equal in both cases (about 150 mm/s), the peak height 
gives a direct measure of the effective mass acting on the impacting bodies. 
Fig. 11 shows that in the regarded direction similar impact intensities can 
be expected and thus, the ’optimal trade-off’ yields no improvement with 
respect to the impact behavior, which expresses itself also in Gopt and in 
Fig. 7. 

In contrast to this, according to the values in Gopt and the tolerance area 
of Fig. 8, the optimal trade-off must show significant advantages with regard 
to the mating tolerance. This is verified with time histories of the xq lateral 
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Fig. 11: Gripper force in mating direction during insertion 
for ’optimal trade-off’ compared to reference config- 
uration 



gripper force during insertion, Fig. 12. Although the lateral offset is approx- 
imately equal in both cases, the reduced lateral end-effector stiffness of the 
optimized system allows compliant motion and thus reduces the strains on 
the manipulator and on the mating parts significantly. Most of the improve- 
ment in this criterion is due to the change in control coefficients, since the 
end-effector stiffness is essentially determined by Kp and Kd- The manipu- 
lator position go defines the gripper’s Jacobian and thus the lever arms the 
compliant controllers can work on. 

In Fig. 13 the zq path deviation due to external forces is depicted for 
both the reference and the optimized configurations. In fact, two different 
sources exist, which excite the manipulator dynamics, external contact forces 
and the desired movement. In order to separate those two effects in the 
experiment, the trajectory is mecisured twice for each configuration. First, 
the desired trajectory is performed without any external forces, particularly 
with no contacts. The resulting path deviation is then subtracted from the 
path deviation measured during manipulation. Fig. 13. This ensures that 
only the path deviation resulting from contact events show up in Fig. 13. It 
can be seen that the first amplitude peak resulting from the initial impact is 
reduced by a factor of 3. Furthermore, the transient behavior is much better 
damped in the optimized configuration and shows no oscillations. Thus, also 
the vibrational performance is significantly improved by the optimization. 
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Fig. 12: Lateral gripper force during insertion for ’optimal 
trade-off’ compared to reference configuration 




time [s] 

Fig. 13: zq path deviation in mating direction due to mating 
forces during insertion for ’optimal trade-off’ com- 
pared to reference configuration 
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7 Conclusions 



We have shown the analysis and automatic synthesis of an automated as- 
sembly cell using numerical programming methods. The robot’s response 
behavior is optimized by choosing appropriate position and controller gains. 
It was shown that the criteria give a good performance measure with respect 
to the physical effects to be optimized. Constraints have been introduced, 
which make the solution of the problem applicable in practice. The cost 
functions were found to be nonlinear and nonconvex functions of the opti- 
mization parameters and show local minima. Thus, for good convergence 
analytic function derivatives are necessary. The resulting vector problem is 
solved and the Pareto-optimal set of solutions is discussed for a rigid peg-in- 
hole insertion carried out by a PUMA 562 manipulator. Significant improve- 
ments can be gained in each of the criteria. However, they must be paid for 
with sometimes large deteriorations in other objectives. Impact sensitivity 
and mating tolerance are to a wide extent not contradictory to each other, 
whereas optimization for vibrational behavior imposes opposite demands on 
the natural dynamics of the manipulator, which makes a trade-off necessary. 
The performance of a compromise configuration is discussed. Although this 
configuration shows a slight deterioration for the impact sensitivity criterion 
with respect to a refernece, mating tolerance and vibratory behavior can be 
to a large extent improved. This is tested experimentally and the results 
show that the improvements gained by the optimizaiton can be reproduced 
in practice. 
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Abstract. Particularly in view of the introduction of product liability, reliability- 
based design procedures, and for that matter optimization (RBO) receive increasing 
attention. The analysis deals with statistical uncertainties inherent in structural, 
material, damage parameters, etc., which are modeled by random variables. The 
mechanical representation of the structure and the component respectively is 
generally modeled by Finite Elements (FE). In this paper basic mathematical 
formulations of design objectives and restrictions including reliability measures are 
discussed. Based on these models structural reliability analyses provide information 
for design modification and selection of an optimal design solution. As generally 
minimization of the expected total cost of the structure including initial costs and 
costs due to failure, minimization of the overall probability of failure and weight 
minimization with respect to reliability constraints are considered. Design 
problems commonly denoted as multiobjective or multicriteria optimization 
problems are treated. For the reliability analysis numerical methods are utilized to 
estimate the reliability measures. These procedures are already cast in an easy-to- 
use-software, denoted as GOSSAN (Computational Stochastic S.tructural 
Analysis). In this context a concept is discussed, which is based on the separation 
of the tasks of reliability analysis and nonlinear mathematical programming 
techniques for which pertinent applicable software is already available. The RBO 
proceoum utilizes approximation techniques for estimating the reliability 
measures. In particular, the reliability analysis makes use of the well known 
Response Surface Method (RSM) in context with Advanced Monte Carlo 
Simulation techniques, while the reliability based optimization procedure itself is 
controlled by the well known NLPQL-algorithm. Finally a number of numerical 
applications are shown in order to exemplify the approach. 



1 Introduction 

In the fields of structural optimization and reliability theory considerable progress 
has been made particularly in recent years. Deterministic optimization in structural 
and mechanical design as well as in other engineering disciplines has been used as 
a decision tool in order to obtain the best design. The requirement of identifying a 
"best design" includes that the risk of failure of mechanical components, systems 
and structures is considered in the decision making. Thus, the quantification of 
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safety and reliability of structural systems gained considerable importance. Hence, 
a deterministic analysis of usually complex structures neglecting the random 
nature (uncertainties in loading, strength, etc.) is not satisfactory. As consequence 
the structural design process has to be improved by investigation of efficient 
reliability-based optimization methods. The combination of reliability analysis and 
structural optimization becomes very important for the rational and quantitative 
comparison between economy and safety. This fact was first recognized by A.M. 
Freudenthal (e.g. [3, 4]). 

The performance of structural analysis is generally based on mechanical and 
mathematical models. In most cases the mechanical model of the structure is 
idealized by means of the Finite Elements (FE). Uncertainties of structural 
systems are modeled by random variables and/or stochastic processes whose 
characteristics must be estimated from measurements. Design objectives and 
restrictions, including reliability measures, have to be formulated mathematically. 
Based on these models structural and reliability analyses provide information for 
design modification and selection of an optimal design solution. 

The complexity of the reliability-based optimal design problem depends strongly 
on the types of the considered design variables (sizing, shape, material and global 
topological variables) and their combinations. Most studies performed in the fields 
of sizing and shape optimization utilize standard optimization algorithms. 
Comparatively little work is available with respect to the consideration of material 
parameters in overall structural optimization. If parameters of the objective and 
constraint functions are considered as random variables these functions, of course, 
become also random, which, in turn, however, makes the design optimization 
problem extremely complicated. As to the present, no literature on realistic 
engineering examples of such stochastic structural design problems appears to be 
available. So far only statistical parameters of the random variables and reliability 
measures are included in the design problem formulation. It is important to note 
that design variables are always deterministic variables, because otherwise design 
modifications could not be carried out in practice. 

The RBO approach viewed as a mathematical programming problem raises the 
question with respect to the meaning of an optimum solution. In the literature 
various alternatives are suggested: e.g. minimization of the expected total costs of 
the structure including initial costs and costs due to failure, minimization of the 
overall probability of failure, weight minimization with respect to reliability 
constraints etc.. Very often more than one objective is considered, e.g. the 
minimization of the expected total cost and the maximization of the overall 
system reliability. Such design problems are known as multiobjective or 
multicriteria optimization problems. Aside from the individual interests it has to 
be kept in mind that the information resulting from the optimization procedure is 
generally used for decision support only. 

With respect to the definition of failure and the related costs different damage 
levels may also be considered for the assessment of the structural reliability. Total 
collapse, i.e. total failure of a structure under extreme loads as well as partial 
damage have to be taken into account in a realistic concept of design optimization. 
The latter damage criterion may influence the design decisions significantly. In 
most problem formulations time dependent behaviour of structural systems is not 
taken into account. Due to the fact that system parameters usually change during 
lifetime, e.g. due to change of load conditions, fatigue, repairs, etc., which, as a 
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consequence, causes a change in reliability, it is necessary to extend the design 
objectives. When the structural reliability decreases with time it is often necessary 
to develop maintenance programs. Optimal inspection and repair strategies may be 
integrated in the reliability-based optimal design problem. Hence, efficient 
methods and strategies are needed to solve the extensive reliability-based optimal 
design problem. TTiis applies to algorithms of structural, stochastic as well as 
optimization analyses, respectively. 

As RBO problems are strongly nonlinear this applies particularly to nonlinear 
programming methods which are utilized to solve the design problems. For this 
purpose in the literature a series of algorithms has been applied. Usually, 
algorithms are utilized as a black-box and are directly connected to the routines for 
reliability analysis. This means that the subprogram for the reliability analysis is 
called by the optimization program through an interface. This is certainly the most 
general approach because the optimization problem is separated from the reliability 
problem. A proper data transfer through an interface is necessary. In RBO the 
NLPQL algorithm [12, 13] has proven to be one of the most powerful tools for 
nonlinear programming problems. As RBO is an interdisciplinary engineering task 
it requires the combination of different interacting analysis and synthesis tools. 
These disciplines are structural modeling and analysis, reliability modeling and 
analysis, decision modeling, mathematical programming, all within an efficient 
software environment. 

In this context the NLPQL algorithm is implemented into an interactive, 
modular, and flexible software environment such that it can be applied to RBO or 
deterministic optimization problems in a most general form. Single as well as 
multiple optimization problems can be handled. The problem function values and 
their gradients can be provided on COSSAN-User input file level by object 
oriented programming [2]. 



2 Reliability-Based Optimization Problems and 
Procedures 



2.1 Problem Formulations 

One of the major goals of structural reliability theory is the optimization of 
structural design based on reliability concepts. For this purpose the quite different 
disciplines reliability analysis and mathematical programming are interrelated by 
an optimum design problem formulation. 



2.2 Deterministic Problem Formulation 

The general aim of optimization is to find extrema - that is, minima or maxima of 
commonly complicated nonconvex real- valued functions. These functions are used 
to compare various design solutions of the investigated system. In most design 
problems specific constraints must be satisfied. Depending on the respective 
optimization problem inequality constraints and/or equality constraints have to be 
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considered in the formulation. Thus, a general mathematical model for 
deterministic design optimization can be defined. 

Identify a vector \e of design variables to minimize an objective function 
defined within the domain S cz 

f{x) = f{xi,X2,...,x„)^min (2.1) 

subject to equality constraints and inequality constraints respectively 

hj{x) = hj{xi,X 2 ,...,x„) = 0 j = ltop (2.2) 

= i = \tom (2.3) 

Lower and upper bounds of the design variables, the so-called side constraints, are 
included. 

Depending on the types of the objective and constraint functions a series of 
different mathematical programming algorithms are available. As structural 
optimization problems are usually formulated as nonlinear programming 
problems, respective mathematical programming procedures are mentioned only, 
e.g. feasible direction method, gradient projection method, generalized reduced 
gradient method, linearization method, cost function bounding method, method of 
moving asymptotes, sequential quadratic programming methods, potential 
constraint strategy, etc. (see e.g. [1, 12, 13]). Other optimization algorithms based 
on Monte Carlo simulation (see e.g. [11]) and evolutionary procedures are known 
to solve very complex problems especially when discontinuous functions are 
considered. 

The RBO problem can be defined in a similar form. The important difference lies 
in the properties of the design variables and the parameters of the problem 
function. Deterministic variables, stochastic variables, stochastic parameters of the 
random design variables and reliability measures may be introduced in the 
optimum design formulation. 

2.3 Stochastic Problem Formulations 

In RBO of structural systems it is assumed that both the loads and the member 
strengths respectively are random system parameters. If some of the function 
parameters are random, the optimum design formulations - objective function and 
constraints - become random functions. The general stochastic optimization 
problem writes: 

/(x,Y,r(x,Y,y'))-^min (2.4) 

subject to 

/i^(x,Y,r(x,Y,y')) = 0 



7-1 to p 



(2.5) 
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g,(x,Y,r(x,Y,y'))<0 i = l to m (2.6) 

where 

X is the vector of deterministic design variables 
Y is the vector of random variables 
r is the vector of the considered probability functions 
y' are the statistical parameters of the random variables. 

The statistical parameters y' may also be included in the deterministic design 
vector X. 

A somewhat involved situation would result from introducing random variables 
as design parameters into the design process. For the theoretical background see 
e.g. [7, 8, 9]. 

A RBO problem is a stochastic structural optimization problem. This is due the 
fact that probability functions with respect to the considered reliability measures 
have to be determined. 



2.4 Reliability-Based Problem Formulations 

Already in 1956 Freudenthal discussed the problem of specification of an 
acceptable risk "...on the basis of economic balance between the cost of increasing 
the safety and the cost of failure" [4]. Thereby the optimal economic probability of 
failure should make the sum of all anticipated costs a minimum. 

2.4. 1 Single Objective Problem 

In probabilistic context several reliability-based optimization formulations have 
been suggested. Basically they can be written in the form 

/(x,y’,r(x,Y,y'))^min (2.7) 

subject to 

hj=hj[x,f ,T{x,Y,y')) = 0 j = \ to p (2.8) 

gi =gi{x,y\ r(x, Y, y' )) < 0 / = 1 to m (2.9) 

It should be pointed out that in the problem formulation as shown in the above 
equations no random functions are considered. Parameters for the description of the 
joint distribution function of the basic random variables are denoted by y', and r 
represents the vector of probability functions or reliability measures e.g. failure 
probabilities with respect to element failure modes. 

2.4.2 Multiobjective Problem 

In a number of structural optimization problems more objectives rather than one 
are pursued. There are basically two very alternative ways to approach such 
problems: 
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a) the collection of objectives are rewritten by one objective function, and 

b) the multiobjective problem formulation is treated by application of vector 
optimization methods. From the current works related to RBO it can be seen 
that in most cases the first strategy is pursued. 

The collection of subobjectives may follow the approach 

m 

P= Xw,-^(x,y’,r(x,Y,y’))->min (2.10) 

1=1 

f = (/l Jmf (2.11) 

where w{ are weights representing the importance of each single objective. The 

multiobjective problem then writes 

f-^min (2.12) 

which means that the single objective functions should be minimized 

^(x,y',r(x,Y,y'))->min (2.13) 



where the design vectors x and y' are elements of the feasible set S czR^ defined 
by the constraints. There is generally no unique point at which all the objectives 
(2.13) reach their minima simultaneously. Therefore, the approach (2.10) might be 
preferred for practical purposes 

2.4.3 Standard Reliability-Based Formulations 

The most common objectives include: 

a) minimization of the expected total cost of the structure or 

b) maximization of the expected overall utility of the structure 

c) minimization of the probability of failure for a fixed structural cost, or 

d) the minimization of the expected cost of the structure for a specified level of 
failure probability. 

Cases a) and b) can be formulated as unconstrained optimization problems, while 
problems c) and d) include constraints either as equality or inequality functions. 
Various extensions and combinations of these approaches are also possible. 
Mathematical formulations of cases c) and d) are shown in the following. 

Find the design vector xe such that it is a solution to 
case c: 

Psystern(x)^max or Pfsystem(x)-^rnin (2.14) 



subject to 




86 



f{x)<f^^ (2.15) 

Xii < Xi < Xin i = l to n 

where ^systemi^) is the structural system reliability index, Pf^systemi^) is the 

system failure probability, is the maximum acceptable initial cost or 

weight of the structure; 

case dl: 

/(x)->min (2.16) 

subject to 

i = l to m (2.17) 

Xii < Xi < Xi^ / = 1 to n 

where (i/(x) is the reliability index related to failure mode /, and 
represents the acceptable lower limit; 

case d2: 

/(x) -4 min 

subject to 

^system (*) - ^system Pf, system (^) - Pf, system i 

Xii < X( < Xin / = 1 to n 

where ^^^yltem Pf^.system acceptable limits of the system reliability 

measures, and /(x) is the cost of the structure. 

It is quite obvious that with an increasing number of constraints the 
conditioning of the optimization problem becomes more difficult. The chance to 
find optimal solutions decreases. Thus, it is important to formulate the 
optimization problem in an appropriate manner. 

2.5 Decision Models for Global Optimization 

2.5.1 Physical Objectives 

A most frequently used approach is to consider the weight of a mechanical 
structure as objective function which is to be minimized. In addition, with respect 
to RBO, failure probability or safety index constraints are introduced into the 
design problem. Design variables may be cross-sectional dimensions or global 
topological parameters. The idea to consider the structural weight as design 
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objective is simply derived from pure deterministic design aims in civil 
engineering. In the simplest form the problem writes, e.g. 

m 

W = -4 min (2.19) 

;=i 



subject to 

Pfj - j = i lo p (2.20) 

where Aj represents the cross-sectional area of element j of the length Ij, and pj 

is the unit weight. The design vector x contains cross-sectional dimensions 
whereas the global topology remains unchanged. In the above equation failure 
probability constraints related to the p considered failure criteria are formulated. 

2.5.2 Expected Utility Objectives 

In the next step of improved decision making expected utilities or expected 
monetary values are used as objective criterion. Cost expectations derived from 
weighted cost components due to product manufacturing and failure consequences 
are considered. For example, the following cost function is used to assess the 
design alternatives 

C = c/(x) -I- Is{x,y)cs{x) + Ic{x,y)cc{x) min (2.21) 

where c/is the initial cost, c^are costs due to e.g. partial failure, and Cpis the cost 
due to total collapse failure. The factors Is and Ic arc indicator terms which are set 
to one if partial failure or collapse failure occurs, and they are set to zero if no 
failure occurs. The total cost C depends on random quantities and is therefore a 
random function. In conventional decision making the judgement is based on the 
expected value of C which can be estimated as follows 

E[C]=^ci + psCs + PcCc -> min (2.22) 

where 

= ^[^5(x,Y)<0] (2.23) 

Fc = 'PUc(x.Y)<0] (2.24) 

with the occurrence probabilities pg and which are dependent on the design 
vector X and the limit state functions ) and In order to reflect realistic 
design it is assumed that pc is small compared to p^. The random vector Y 
follows the joint distribution function /v(y). 
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The cost representation is a continuous function of x, but it is discrete due to 
the discrete failure indicators 7^ and The cost as consequence to failure events 
either occur or do not occur depending on the state of Y, respectively the j-ih 
realization yj. The cost function becomes continuous in y too, if instead of 

failure indicators continuous damage indicators are used so that an infinite number 
of damage states are possible. Hence, the distribution function becomes 
continuous in x and y. The objective of meeting this requirement may then be 
expressed as: 



m , . 

C = c;(x)+ XCc,^x,£»^(x,Y)) DfceQ^[0,l] (2.25) 

k=\ 



with damage indicator Djc concerning the k-ih damage criterion, e.g. 
displacements, plastic deformation, and the respective damage consequence cost 
Cj)j^ of m failure or damage elements. The reliability-based problem then writes 



£[C] = c,(x)+ I £ Cp^(x,D^(x,Y)) 
k = \ 



(2.26) 



subject to 

£[Z)^(x,Y)>4]</7f"^^' (2.27) 

where is an upper bound of the failure probability related to criterion k and 

fragility value D]^. Following such a strategy, a probabilistic overall assessment 
of design alternatives becomes possible and hence, a design process based on the 
probability structure of the objective can be performed in a most rational way. 

2.5.3 Cost Probability Criteria 

The aim of this type of decision strategy is to determine the distributions of cost 
functions and to use the expected pattern of outcomes as information for decision 
making. Higher statistical moments and cost probabilities can be utilized to make 
decisions of high quality. They may be considered as objectives or as constraints 
in the problem formulation. 

If the distribution function of the total cost can be determined, i.e. in the 
following form: 

Fc{c) = P[C<c] (2.28) 

the decision problem respectively the design problem can be formulated e.g. 

OP ^[C^w]-^max (2.29) 
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or alternatively 

^[^min “ ^ ~ ^max ] max (2.30) 

where a specified probability guides the design process. In the literature examples 
of such decision strategies are rarely applied, especially if the investigations are of 
more theoretical nature. Quite naturally much more attention to cost probability- 
based decision criteria comes from practitioners of economic decision making. 

For the probabilities usually no analytical functions are available. Therefore, 
MCS techniques serve as convenient and appropriate tools for estimation of such 
probabilities. By simulation of the random quantities the cumulative cost 
frequencies are obtained and the probabilities of failure or other event occurrence 
probabilities are estimated in parallel. The results are available in each design 
iteration step of the design process. Hence, an iteration history of estimates of 
distribution functions is provided. The distribution function is estimated by 

= (2.31) 



which is due to the simulated cost function 

= ( 2 . 32 ) 

for a design vector corresponding to the k-ih design iteration, and yj are 

realizations according to the j-th simulation of the random variables. Estimates of 
failure probabilities write 

= (2.33) 

Pck {gck (x* > Y)) = Pc (2.34) 

with and as occurrence probabilities due to partial and total structural 
failure respectively. Certainly, in most cases direct simulation of structural 
systems is practically impossible. Therefore, approximation techniques are required 
to reduce the numerical effort significantly. An approximate damage representation 
can be considered as one possibility to overcome the numerical problems and to 
estimate cost function probabilities utilized for structural optimization. 

2.6 Decision Models for Local Optimization 

The decision models may also be concerned with optimal manufacturing and 
maintenance of welding constructions taking into account fatigue crack growth 
within welding joints. Unstable fatigue crack growth is considered as failure 
criterion in the local RBO approach. Different optimization strategies are possibly 
dependent on product lifetime periods and the pursued quality assurance 
philosophies. All these approaches are based on expected cost formulations and 
probability constraints. However, other problem formulations are generally also 
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possible. Needless to say that this issue will not be discussed here. For further 
details it is referred e.g. to [5] and the references mentioned therein. 

2.7 Procedures 

So far when dealing with problem formulations it was already mentioned that the 
main problem of RBO of structures is that the limit state function is usually not 
known in explicit form and furthermore it depends on the structural design. It has 
to be kept in mind that when a reliability analysis is performed the results are 
related to a specific structural design, and that the limit state function is defined as 
function of the random variables conditioned on that particular design status: 

g = g{x,y) xeSczR^ (2.35) 

where a failure event yj is identified by 

j < 0 — > failure (2.36) 

Hence, the estimated reliability measure which may be used in the optimization 
problem formulation depends also on the design status, e.g. the probability of 
failure writes by definition 

P^=P^(x) = p[y1^(x,Y)<0] (2.37) 

The difficulties in solving the RBO problem reduces considerably if one assumes 
that the limit state function is given as function of the design and random 
variables, and the requirements of the availability of an efficient and accurate 
reliability analysis procedure is fulfilled. For these cases the limit state function 
may have the form 



g(x,y) =ia^Mk~S = 
k=l 

= ajXjA:2/?e+...-5 = (2.38) 

= aiXiX^yi+...-y^ 

which corresponds to the limit state function of a frame structure with rectangular 
cross sections with plastic hinge collapse modes. The reliability measures - safety 
index or failure probability - can be determined quite simply for each design vector 
X. But the function is generally an unknown function in real life of complex 
structures. Due to this fact it is at least necessary to approximate the limit state 
function g(x,y) by a design dependent so-called response surface g(x,y), i.e.: 






(2.39) 
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In addition it is necessary to approximate reliability measures as functions of the 
design variables, primarily the failure probabilities. 



3 Numerical Examples 

3.1 Direct Approach of NLP 

3.1.1 General Considerations 

The aim of RBO is to develop procedures and software tools such that one is in 
the position to apply a mathematical programming method to any general 
nonlinear optimization problem with respect to RBO. For this purpose the various 
analysis and synthesis tools have to be preferably available in black-box form 
within a powerful software environment providing flexible interfaces. In other 
words, the reliability-based structural optimization approach is simply based on 
the direct connection of structural analysis (FE), reliability analysis (use of RSM 
for problems of engineering practice) and optimization methods (e.g. NLPQL for 
general problem formulations). 

In the first step the decision model has to be developed and the problem 
functions have to be defined, e.g. the objective function determines the total 
expected cost including failure consequence costs subject to reliability constraints. 
In addition to that the reliability model has to take into account all uncertainties of 
the decision system so that the reliability measures can be estimated utilizing an 
efficient reliability analysis method. At this point the main problem occurs, for 
real complex structures the limit state function is usually not known in explicit 
form. Hence, single limit state points can be calculated only by application of 
Finite Element analysis methods which increases the numerical effort 
tremendously. For reasons of efficiency direct MCS to calculate response are not 
applicable. Therefore, approximate methods, such as the RSM, have to be utilized. 

Following definitions and having available the calculation tools, the problem 
functions can be determined. This is the point now where the optimization 
algorithm can be executed. During the iterative procedure the problem functions 
and their gradients are evaluated according to the rules of the mathematical 
programming algorithm. The optimization procedure is executed in the socalled 
reverse communication mode (see [5] for details) so that external or third-party 
software can be utilized as well to calculate the problem functionals (e.g. FE- 
software). According to this the actual design will be modified. This causes, in 
turn, a modification of the mechanical system and/or the reliability model. 
Subsequently, the structural response quantities are calculated (use of FEM) to 
determine the points at the limit state surface and the response surface as well as 
other functionals as required. With the calculation of safety indices or failure 
probabilities (use of RSM and MCS) the problem function values are determined. 
Iteration and/or other parameters may be controlled interactively if the 
optimization procedure does not show convergence. 
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The numerical efforts result from repeated response and reliability calculations 
due to following tasks: 

" iterations of mathematical programming algorithm 

- line search calculations (if NLPQL is utilized) 

- gradient calculations (all Newton Methods) 

- limit state point search to determine the coefficients of the response 
surface, especially when nonlinear structural behaviour is taken into 
account 

- additional effort for different classes of failure criteria (e.g. plastic hinge 
mechanisms and stability criteria considered simultaneously). 

The major numerical efforts are due to limit state calculations, because FE 
analyses have to be performed several times. However, the situation changes 
completely if the limit state can be formulated in explicit form. In this case the 
direct optimization procedure does not change, but no response calculations are 
necessary. The solution of this type of problem does not cause severe difficulties 
as the failure probability can be estimated by simulation of the random variables 
only. The efficiency of the developed procedure and software is demonstrated by 
simple examples in the following subsections. 

The first example shows the general logic of RBO when FEM and RSM as well 
as Importance Sampling simulation procedures are used. The intention of the 
subsequent comparative example is to show the efficiency and the quality of the 
numerical methods. 

3.1.2 Simple Frame Structure Using the RSM 

In this example a simple frame structure (see Fig. 3.1) is analyzed to demonstrate 
the basic principles of the direct optimization procedure. The NLPQL algorithm is 
utilized directly in connection with the RSM and Importance Sampling. The 
decision problem is formulated as an unconstrained reliability based minimization 
problem of the expected total costs. The initial or production costs are assumed to 
be proportional to the structural mass. The failure consequence costs are due to 
loss of serviceability and occurrence of collapse failure. First yielding in any 
component represents the loss of serviceability and collapse is modelled by 
complete plastification of the structure (see eq. (2.38)). The optimal results are 
calculated for various combinations of partial (first yielding) and total (collapse) 
failure costs. 
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The simple one bay-one story frame structure is modelled by eight pipe 
elements and nine nodes and an ideal-elastic ideal-plastic material behaviour is 
assumed. The structure is exposed to both horizontal (H) and vertical (V) load 
components. The horizontal load component is assumed to be normally distributed 
with a zero mean value and a standard deviation of 20KN, the vertical loading 
follows a lognormal distribution LN(50KN, 20KN). For simplicity the 
uncertainties of the material properties are neglected. All elements have the same 
dimensions and properties. In the next step after the structural modeling and the 
definition of the sample space the actual reliability analysis can be performed. As 
two random variables are considered only, this task seems to be not very difficult 
to solve. For such a simple structure this is the case if the component limit states 
(plastic hinge mechanisms) can be formulated explicitly. However, FE methods 
and the RSM are used for general type problems. Then the reliability analysis 
procedure contains the following steps: 

calculation of a specific number of limit state points for each 
considered failure or serviceability criterion 

evaluation of the coefficients of the response surface, a 2nd-order 
polynomial is utilized 

determination of the design point utilized for importance sampling 
(suboptimization problem) 

weighted simulation of the random variables around the design point 
and estimation of the system failure probability (6048 simulations). 

In this example five interpolation points of the response surface have to be 
determined according to the first yielding condition and the collapse criterion. In 
practice first yielding is represented by more than one iteration in a Newton- 
Raphson procedure. Structural collapse is modelled by exceedance of a "maximum" 
number of iterations to reach equilibrium. 

The RBO problem for the structural system shown in Fig. 3.1 is to minimize 
the total expected costs including structural costs and failure consequence costs. As 
already mentioned, two load components are considered as basic variables. Design 
variables are the pipe diameter D and the wall thickness t of the pipes. The 
unconstrained RBO problem writes (according to eq. (2.22)) 
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E[C] = Co(x) + CsPs{^) + c, Pci'S.) min 


(3.1) 


0.2<£><0.4[m] 


(3.2) 


0 . 01 <f< 0 . 02 [m] 


(3.3) 



where the material cost component is given by Co(x) = 50 m(x), m[kg] is the 
mass of the structure. The first yield probability and the collapse probability are 
denoted with ps and pc. The respective consequence cost components are cs and 
The initial costs can be directly compared with failure consequence costs. Complex 
constraints can therefore be avoided which reduces the complexity of the 
optimization problem. 



combination: 


[cost units] 


Fcost units] 


1 


1.0e4 


1 . 0 e 6 


2 


l.OeS 


1 . 0 e 6 


3 


1 . 0 e 6 


1 . 0 e 6 


4 


1.0e4 


1.0e7 


5 


1.0e5 


1.0e7 


6 


1 . 0 e 6 


1.0e7 


7 


1.0e4 


l.OeS 


8 


l.OeS 


l.OeS 


9 


1 . 0 e 6 


l.OeS 



Tab. 3.1: Failure consequence cost factors used for sensitivity study. 



A sensitivity study with respect to the variation of failure consequence costs is 
performed to show the importance of taking into account serviceability or partial 
failure criteria in addition to collapse criteria. The considered failure cost factors are 
listed in Table 3.1. The cost combinations where the "first yielding" costs are 
higher than the collapse costs are not of practical interest. However, from the 
mathematical point of view the general dependency of the optimal solution on the 
cost factors can be investigated. 

The following two Figures 3.2 and 3.3 show the optimal results of the objective 
function and the pipe radius dependent on the failure consequence costs. An 
equivalent figure for the wall thickness is not shown as for all cost combinations 
the results are identical, in fact the value of this design variable is set to its lower 
bound. 

It can be seen that in case of very high partial failure costs (first yielding, 
C 5 = 1 . 0 e 6 ) the objective and the design variable value show very little sensitivity to 
variations of collapse cost variations. In this RBO problem formulation the 
serviceability respectively the partial failure criterion is dominant. In contrast to 
this the optima are always sensitive to variations of the partial failure costs. From 
these results it may be concluded that it is important to introduce partial or 
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serviceability criteria into the decision problem given that the expected 
consequence cost component c^Psi^) is high enough. For example, the optimal 
results do not change very much if the cost c^is changed from 1.0e4 to 1.0e5 cost 
units, but from l.OeS to 1.0e6 cost units the optima vary considerably. It is not 
surprising that the objective and the pipe radius show the same sensitivity due to 
cost variations. With an increasing pipe radius the failure probabilities decrease 
rapidly as the global structural stiffness increases and vice versa. Therefore, a more 
resistant structural system is required if higher failure consequence costs are 
expected. 




objective value 
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68.000 
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1,0E+41h0E+6 



76.000 

74.000 
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64.000 

62.000 

60.000 
1 ,0E+S 

1.0E+7 
collapse cost 



Fig. 3.2: Optimal expected total costs of simple frame structure due to various 

failure consequence cost combinations. 



From the experience in carrying out this simple example it can be concluded that 
the proposed procedure is straight forward and generally applicable. However, from 
a practical point of view the numerical effort is still not satisfactory, although the 
RSM and importance sampling is applied. This fact becomes more obvious if a 
parameter study is performed. Convergence problems occur when the search 
directions of limit state points are inappropriately chosen. To the same class of 
problems belongs the problem of inaccurate determination of the limit state 
points. In both cases the limit state functions are not suitably approximated which 
causes convergence problems due to the fact that the failure probabilities are not 
accurately estimated. Hence, interactive control of the reliability analysis and the 
optimization procedure is necessary, at least for some trial calculations. 
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3.1.3 Simple Truss Structure Using Explicit Limit State Function 

This second example deals, first, also with the general procedure of the direct 
optimization approach to RBO problems. The aim is to reproduce the presented 
results in [10] utilizing the GOSSAN software package [2] and the algorithm 
NLPQL [12, 13] and to show the efficiency of the developed software tools for 
this purpose. Subsequently, this example is considered as a reference example for 
demonstration of the applicability of the posterior approximation strategy 
discussed subsequently. 




pipe radius 
0,155 

0.15 
0,145 
0,14 
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Fig. 3.3: Optimal pipe radius of simple frame stmcture due to various failure 

consequence cost combinations. 



A schematic bridge structure modelled by 13 truss elements should be optimized 
such that the structural weight will be minimized satisfying a given system 
reliability constraint. The structural system, its dimensions, the element 
numbering and the load components are shown in Fig. 3.4. Seven design variables 
are introduced which correspond to the cross-sections of the symmetrical grouped 
truss elements. The structure is loaded by three vertical load components. These 
load components and the 13 resistance properties of the truss elements are 
considered as basic variables in this RBO problem. Due to the fact that the system 
is statically determined, system failure can be modelled by exceedance of the yield 
stress in any truss element (chain mechanism). 

The element limit states can be formulated in explicit form as functions of the 
design variables and the random variables respectively. All element limit state 
functions are linearly dependent on the element resistances and the mentioned three 
load components. Hence, FE analyses are not necessary for structural response 
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calculations to determine failure states, neither for single realizations of the basic 
variables nor for response surface calculations. The element and the system failure 
probabilities can simply be estimated by MCS of the random variables. 

The yield stresses of the elements are considered as random resistance variables 
which have all the same mean value of 248.2 N/mm^ and a coefficient of variation 
of 12% (9.76 N/mm^). The loads have as well same mean values of 66723 N and 
a COV of 16% (10675.7 N/mm^). All basic variables are assumed to be normally 
distributed and statistically independent. 




Fig. 3.4: Truss structure with 13 elements (data from [10]) 



The RBO problem as mentioned above is defined as a weight minimization 
problem where the objective function writes (eq. (2.19)) 

13 

B^(A) = -> min (3.4) 

i=\ 

with the cross sections the element lengths and the unit weight pj . The 
design space is restricted due to the system reliability constraint (eq. (2.18)): 

/7yt(A)<10-^ P, (A) >4. 256 (3.5) 

The design variables A = xeSc/?^ correspond to seven groups of element 
cross-sections. The limit state function is given in explicit form. For each 
combination of the design variables the failure probability has to be estimated. 
Provided that the probability of failure can be estimated with high accuracy, the 
optimization procedure can then be executed in form of a "black-box" utilizing any 
available optimization algorithm. Nakib and Frangopol [10] applied the feasible 
direction method and a penalty function method (reference solution). 
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For this type of problem formulation and due to the assumptions made it can 
generally be expected that the optimum design is near the limit of the feasible 
domain, in other words, the constraint defined in eq. (3.5) becomes active at the 
optimum. This circumstance can be explained that there exist a direct relation 
between the objective functional (weight) and the structural stiffness. In this 
example the structural weight decreases with an increasing failure probability. Due 
to this fact the constraint function (eq. (3.5)) becomes active at the optimal 
solution (compare Table 3.2, values of (i). 

The optimum solutions calculated using different optimization algorithms are 
compared in the Table 3.2. The objective value at optimum calculated with 
COSSAN-NLPQL differs only 5% from the reference value. The constraints at 
optimum correspond almost completely. There exist a good agreement between the 
optimal design obtained due to application of COSSAN-NLPQL and the reference 
solution. Deviations might be due to the use of different reliability analysis 
methods (MCS vs. bounding methods). At this point it is indicated that, despite of 
some special cases, all reliability analysis methods provide estimates of the failure 
probability only. However, in this example the element limit state functions are 
linear and all basic variables are normally distributed which allows to consider the 
reference solution as the "true" optimal solution, since for this case bounding 
methods provide good accuracy of probability estimations. 

A large difference can be observed when comparing the numerical effort taking 
the number of reliability analyses as reference measure. With 75 reliability 
analyses COSSAN-NLPQL lies in front and provides much more efficiency. The 
effort for calculating the design point used as center point for the weighted 
simulation and the number of simulations (4096) might put the numerical 
advantage of COSSAN into perspective. However, in case of explicitly formulated 
limit state functions the underlying measure is not of real significance since no 
response quantities have to be calculated during the iterative procedure. The 
situation changes completely if the limit state functions are not known apriori . 

Assuming that the limit state functions are - due to nonlinear structural behavior 
- much more complex and, furthermore, not known explicitly one would still like 
to perform a RBO taking into account the same number of basic variables. As it 
was already mentioned before the RSM has to be applied for this purpose. There 
remains only the problem of calculation the limit state points. Using a second 
order polynomial approximation it would be necessary to determine 152 
interpolation points. In context with RBO taking into account seven design 
variables and keeping in mind the gradient calculations this strategy is by no 
means practicable anymore. Hence, the application of additional approximate 
strategies is required. The posterior approximation approach of representing the 
failure probabilities as explicit functions of the design variables appears to be an 
efficient way to overcome this practical engineering problem. 
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variable 

number 


reference solution 


GOSSAN 
using NLPOL 




feasible direction 


penalty 








[mm^l 


[in2] 


[mm^l 


[in2] 


[mm^l 




1 




1.177 


745.8 


1.156 


934.2 


MgEM 


2 




1.845 


1218.7 


1.889 


1156.8 




3 




1.139 


752.9 


1.167 


670.3 


BliltM 


4 




1.177 


745.2 


1.155 


941.9 




5 




0.359 


227.1 


0.352 


193.5 




6 




1.307 


840.0 


1.302 


733.5 




7 




0.797 


521.9 


0.809 


877.4 


■Kiaf 


objective 

fct. 




804.2 




809.2 




847.2 


6 




4.256 




4.256 




4.237 


iterations 




379 




636 




5 


number 

ofRA 




379 




636 




75 



Table 3.2: Comparison of optimal solutions obtained by GOSSAN - NLPQL with 
reference solution (see [10]) 



3.2 Posterior Approximation of Reliability Measures 

3.2.1 General Remarks 

The basic aim of this approximate method is to reduce the numerical efforts of 
reliability analyses required for RBO and sensitivity analysis with respect to the 
decision parameters. For completeness the general RBO problem formulation is 
considered, which may be written as follows 



/(x, r(x)) ^ min (3.6) 

defined on the feasible domain S 

xe5c/f", reZ)c/?* (3.7) 



In each iteration the functionals and the actual gradients have to be evaluated 
which would require repeated calculations of the reliability measures r. As already 
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mentioned before the numerical efforts become impracticable, particularly for 
RBO's of larger complex systems. Therefore, having in mind the response surface 
methodology, one could try to represent the considered reliability measures using 
an approximate function, dependent on the design variables. The advantage is that 
during the optimization procedure the reliability measures r can be approximated 
by calculating the explicit functions r . In the following subsection it is discussed 
which type of reliability measure can be approximated with sufficient accuracy 
covering the complete design space. The optimization procedure is then 
demonstrated considering again the truss structure as treated above. 

3.2.2 Approximation of Probability of Failure 

Generally one could approximate the first order reliability index as well as the 
failure probability as functions of the design variables. However, using the RSM 
there is basically no need to introduce approximate first order reliability indices as 
there is no real additional numerical effort necessary to simulate the random 
variables to estimate the failure probabilities. Moreover, from practical experience 
it can be claimed that estimates of failure probabilities ensure at least to some 
extent numerical continuity during the iterative procedures which is not the case 
for first order reliability indices. It was suggested that for small values the failure 

probability can be approximated with sufficient accuracy by simple exponential 
functions [6]. Provided that the failure probability is small over the complete 
design space the following approximation is proposed 



where 

ff= exp(ao + b^x + x^Cx j (3.9) 

The parameters a, b, C are polynomial coefficients which have to be determined 
by solving a linear equation system. The equation system is assembled by 
estimating failure probabilities for specific design variable combinations and 
interpolating the polynomial function in the exponent of eq. (3.9) where the 
supporting points for the approximate probability function are given by 

Inpf =\npf{\j^ V g(xy,yj = 0, j = l,...,fc (3.10) 



The probabilities Pf{^j) are the estimated failure probabilities using the RSM 

and importance sampling. The number of interpolation points k taken into account 
depends on the number of considered design variables and can be determined 
according to the following relation: 



2 



(3.11) 
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The number of interpolation points proposed in eq. (3.11) provides that the 
equation system for evaluation of the coefficients in eq. (3.9) is determinate. 
Certainly, the goodness of approximation of the RS may depend on the number of 
interpolation points used. However, this can not be claimed generally. A general 
rule for choosing the interpolation points is not available. The only criterion 
which can be taken into account is that the supporting points should cover almost 
the complete feasible domain. In most cases where the feasibility of design 
variable combinations can not be assessed in advance the interpolation points will 
be distributed within the range of the side constraints. 

3.2.3 Simple Truss Structure 

The posterior approximation strategy is illustrated by performance of RBO of the 
truss structure already investigated above. The same problem definition is 
considered where a weight minimization subject to a single system reliability 
constraint is carried out. The approximation of the system probability of failure 
respectively the constraint function takes place based on 70 interpolation points. 
For this purpose 70 reliability analysis calculations have been performed. The 
component limit state functions are given in explicit form. Following this 
calculation step a linear equation system is formed and solved to evaluate the 
probability function coefficients. At this point the actual optimization procedure 
can be executed without any response calculations required for the reliability 
analyses. The numerical effort is only due to the calculation of the interpolation 
points for the approximation of the probability function. Hence, the main 
advantage of this process is that the reliability analyses are reduced to almost 
simple function calculations. 



variable 

number 


reference solution 


GOSSAN 
using NLPQL 


Ai 


feasible direction 


penalty 


70 su 


pports 




[mm^l 


[in^] 


[mm^] 


[in2] 


[mm2] 


rin2] 


1 




1.177 


745.8 


1.156 


645.2 


1.000 


2 


■ilbJifcM 


1.845 


1218.7 






1.670 


3 


■SOI 


1.139 


752.9 




645.2 


1.000 


4 


WiUfJtM 


1.177 


745.1 






1.000 


5 




0.359 


227.1 








6 




1.307 


840.0 






1.000 


7 




0.797 


521.9 








objective 

fct. 




804.2 




809.2 




733.2 


6 




4.256 




4.256 




4.256 


iterations 




379 




636 




6 


number 

ofRA 




379 




636 




90 



Table 3.3: Comparison of optimal results due to posterior approximation with 

reference solution, 70 interpolation points (see [10]) 
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A further advantage is that when probability functions are available in explicit 
form the optimization problem can be individually modified without any severe 
additional numerical effort. For example, the failure consequence cost factors can 
be changed and also the complete problem definition can be rearranged. As long as 
no additional or alternative failure criteria or distribution parameters have to be 
introduced, the numerical effort is confined to the pure nonlinear optimization 
procedure carrying out explicit function calls. Sensitivity studies of parameters of 
the problem functions can be carried out without any problem. 

The results determined applying the posterior approximation method are 
compared with the reference solution in Table 3.3. All seven design variables 
show differences and the optimal solution does not fit so well as the solution 
proposed by using the direct approach. However, the principal tendency of the 
design variables to the optimal solution is maintained. Hence, the results are 
sufficiently accurate for decision making within the scope of practical applications. 



4 Summary 

Some basic principles of RBO of structures and mechanical components have been 
presented. Two different approaches are discussed in some detail. 

Direct Optimization Approach. It was shown that by the use of COSSAN- 
NLPQL and pursuing the direct optimization approach an almost exact agreement 
with the reference solutions, such as the feasible direction method and the penalty 
function method, are obtained. The required numerical effort is considerably less 
than when applying these reference optimization methods. Moreover it is possible 
with COSSAN-NLPQL to apply the RSM which is certainly a big advantage with 
respect to applications of practical significance. There is generally no restriction in 
using any reliability analysis method, e.g. the directional sampling method could 
be applied as well. Consequently, RBO of complex structures can be performed 
where other methods based on explicit and often simplified limit state 
formulations fail. This is primarily the case when nonlinear structural behaviour 
and non-Gaussian distributions have to be taken into account. 

Posterior Approximation Approach. Based on the proposed approximation this 
strategy allows the utilization of reliability-based structural optimization in 
engineering practice. The numerical example showed that the approximate results 
are not of die same accuracy as those obtained by the direct approach. This fact is 
quite clear as, in addition to the limit state, the failure probability function is also 
approximated. Nevertheless, all the design variables tend to the optimal reference 
solution. From this follows that for problems where the limit state function can 
be formulated explicitly the posterior approximation method does not show any 
advantage. In this case it is possible to obtain better and more confidence in the 
results without considerable increase of the numerical efforts by the direct RBO 
approach. With regard to sensitivity studies this argument is still valid (recall: no 
additional structural analyses are required). The advantage of this strategy becomes 
particularly evident for those RBO problems where the number of design variables 
is low and the efforts within the scope of the reliability analysis are high. 
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Approximations are required in any case and hence, the use of the posterior 
approximation approach is most advantageous. Especially when sensitivity 
analyses become necessary for high quality decision making this method is 
preferable, although inaccuracies as those discussed here are to be expected. 
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Stochastic Optimization Approach 
to Dynamic Problems with 
Jump Changing Structure 
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We consider dynamic optimization models the structure of which (functional, 
equations, constrains) can be changed at time depending on control strategy. 

The main problem is a choice of optimal structure and a strategy which 
provides an optimal transition from one structure to another. 

The problem under discussion is connected with modelling global changes 
in economic mechanism (for example, a transition from Centralization to 
Market), radical technological innovations and so on. 

In deterministic case these problems are nonconvex and nonsmooth. We 
propose general approach to such models based on their stochastic approxima- 
tions and obtain a stochastic programming problem with controlled measure. 

This approach is illustrated by economic dynamic model with endogenous 
innovations. 



1 General Economic Dynamic Model 

We study the general multi-sector economic dynamic model with discretely 
expanding technologies. Emergence moments of new technological modes 
(new technological structure) are defined by given levels of expenditure on 
R&D (research and development) and by the strategy of investments into 
R&D. The model may be written in the following form: 

N 

6^+1 ) max, 

k=0tzzBk 

(at, 6i+i) G T/f , Ok<t<0k^i, {ct^dt+i) E Qk, bt^Ct-{-at 

t 

0k=mm{t: ^ dj>^k}, de, - 0,k = I, . . N, 9n+i=t-1, 

j = ^k-l 

where convex technological set (i.e. a set of finput-output’ vectors (a, 6)) 
and concave utility function (fk form structure of economic system; vector 
^k is the level of expenditures necessary for the transition from the structure 
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{(pk,Tk) to Ok is the corresponding transition moment, and 

convex set Qk specifies dynamics of assets in R&D for unit interval. 

This model is nonconvex in general. We use a stochastic version of the 
initial model, which takes into account an uncertainty of expenditure levels 
on R&D and incompleteness of information on parameters of future techno- 
logies. The stochastic model is already locally convex. This fact allows us 
to formulate the stochastic maximum principle for the model and to find the 
optimal structure as well as optimal strategy of transition. 

For simple case {N = 1) this problem was study in [1]. Model of structural 
transition from centralized economy to market economy was considered in 
[2]. Analisys of the above model essentially use the following generalization 
of stochastic maximum principle from [3] . 

2 Stochastic Maximum Principle 
with Controlled Measure 

Let {r/t, f = 0, 1, . . . , r} be stochastic process with values in measurable space 
(5, E) with transition functions Xt,ut,d7]t+i), depending measurably 

on the process Xt E , and on control ut G U, where U is Polish space, 
= (t/o, • • • , ^r), r < oo. The process Xt is described by the system of 
difference equations: 

xt+i = xo = xo{t]o) ( 1 ) 

Suppose that measures are absolutely continuous with respect to 

some (fixed) transition measure Xt, Ut,rjt^i) is dens- 
ity with respect to T/i+i- Each control Ut = generates a measure on the 

space of sequences It is required to maximize the functional: 

T— 1 

E'‘'^ip^{r]\xt,ut) (2) 

0 

subject to restrictions 
r- 1 

> 0, g{rf ,xt,ut) <Q {utEU), {P — a.s.), (3) 

0 

where P is a measure generated by the initial distribution Po{dr]o) and the 
transition function !^‘(-) G <^*(-) G P"*, ffH") ^ 

Let denote 
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^t+i(^t+i j. y) _ a<p^r]\x,u) + a(p^(T]\x,u) + xl)t+if'^^('n*'*'^,x,u) 

-Xtg^(r]\x,u). 



Next we formulate assumptions on the functions = 

required to obtain necessary optimality conditions for the 

above problem. 

General conditions. 

(Al) The functions given by (p^{j]\x,u), t = 0,...,r- 1 , are jointly 
measurable and the composition (p^ Xt{’),Ut{')) G Li(S^) for all admissible 
pairs {(xt, Ut)}. At each point ( 77 ^, x, u) the functions given by <^^( 77 ^, ar, u) are 
differentiable with respect to x and their derivatives (f\,{rf^x^u) (gradients 
with respect to x) are continuous in x and satisfy the following condition: 

for any bounded set C C there exists a function G Li{S^) such 
that for all X £ C 

(A2) The vector-valued functions given by , x,u), g\rj\ x,u), t = 

0, . . — 1 , are jointly measurable. The set Ut(ff) depends measurably on 

the parameter ff, ^ = 0 , . . . , r — 1 . 

(A3) For any bounded set C C R^ there exists a constant Kc such that 

\F*+^{i]^+^,x,u{Tf))\ + \g\T]*,x,u(Tf))\ < Kc a.s. 

for all X ^ C and t = 0,l,...,r — 1 . The derivatives of the constraint func- 
tions X, u) and gl;{rf ,x, u) with respect to x exist and the constant 

Kc which corresponds to every bounded set C C R^ has the further properties 
that 

+ < Kc a.s. 

for all X and ^ = 0, 1, . . . , r - 1, and 

+ \9l{ri\3:x,u{rf)) - gl{rf ,X 2 ,u{rf))\ < Kc\xi - X2\ a.s. 
for all xi,X 2 eC and / = 0 , 1 , . . ., r - 1 . 

Convexity condition. 

(B) For any set of parameters a {ff ,x,u\u'\a}, where u',u" G 
Ut{r]^), X ^ RT , 0<a<l and Q <t < r, an element u^j G Ut[rf) may be 
found such that the following relations hold: 

(p\rf\x,u„) > aip\r)\x,u') + {1 - a)<p^ (i]\ x , u") , 
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^t+i(^«+i .j, a;, u') + (1 - a)F*'^^{rf^^ ,x,u”) 

P^^\rf,-)-o.s„ 

g\rf,x,Ua) < ag\rf,x,u') + (1 - a)g\rf ,x,u”). 



Regularity condition. 

(C) For each 0 <t < r — 1, one can find a control Ut G Ut[rf) a.s. for 
which E\(f^ ( t]* , , Ut)\ < oo and 

< --fe a.s., 

for some positive 7 and ^uniV vector e := ( 1 , 1 ,...,!). 

Theorem 1. (Maximum Principle) Let be a solution of the prob- 

lem (l)-(3), and H* be the corresponding measure on the space of histories 
Then there exist non-trivial vector a = {a, a) G > 0, and 

functions H*), A« e Lf (5‘, H*), At >0, he 

L\{S*,E^,P) such that: 

1) u; = arg max {£’*[/f‘+i(r/‘+i,Xt*,w)| 77 *]+ f h+ilV+^(r}\x*t,u,drit+i)} 

u&U J 

(n* - a.s.); 

2) xl:t = E*[Hi+\rf+\x*,u;)\rf]F J h+iUl+\rj\x;,u,dvt+i), V'r = 0 

(n*- a.s.); 

3) ht=aip^{T]*,x;,u;)+ J h+ilVJ-\r)\x*,u,dr)t+i), hr = 0 

(P-a.s.); 

4) ,x*,u*) = 0; 

5) hg^if ,x*,u*) = a (n*-a.s.)- 

This result extends similar ones in [3,4] and can be proved following the 
arguments used in [3]. 

3 The Model with Jump Changing Structure. 
Supporting Prices 

Let us return to the model described in Section 1 and let’s assume that level 
which is necessary for the transition to the structure be a random 

vector with a given distribution function TTk{y) 0 < fc < iV . Moreover, 
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we assume that before the moment 9k when the structure [(pk^Tk) emerges, 
characteristics of this structure are known incompletely. Formally it is ex- 
pressed by a dependence of the set Tk and function pk on some parameter 
s £ Sk- Before the moment 9k this parameter is assumed random, and its 
particular value Sk becomes known after emergence of the structure (pk,Tk) 
only, i.e. after the moment 9k. We assume that (Sk^Sk^Pk) is a probabil- 
ity space, and distributions Pk{ds) (Ar = 1, . . . , A^) are mutually independent. 
Everywhere below we shall assume that all functions depending on 5 i, S 2 , . . . 
are measurable (with respect to the product £i x ... x and all relations 
between these functions hold almost sure with respect to product-measure 
Pi X ... X P]^. 

A transition program is defined as follows 

= {^('(^ 1 . • • - k = 0,...,N, t>0k, l<0i <02 < ...<0N <r} 

(4) 

where functions 

z'^{0i,...,0k) = {{alb1^,):= (a1{0r,si,...,0,,s,),b^+,{0i,si,...,0k,Sk), 

(5) 

(cnc^f'+i) := . . .,0k,Sk),dt^j^(0i,si, . . .,0k,Sk))}, t > 0k, ( 6 ) 

satisfy the constraints: 

(at,bt+i) eTk{sk), (cf.^+i) eQfe- >cf + a*, 0k<t<0k+i, 

t 

0k = min{< : ^ 4 ^ = 0 . 

j=0k-i 

The initial resource vector bo is assumed given. 

It is required to find a transition program Z (as in (4)-(6)) which maximize 
the functional: 



N 

, 5/j)]=>max. (7) 

k=0t=9k 

In further we shall assume that each function 7Tk{y) is continuously differen- 
tiable in all the arguments, and 7r/e(0) = 0. 

For almost all {si , S 2 , G x . . . x 5jv a sequence of the corresponding 
technologies is assumed to be non-decreasing: 

To C Ti( 5 i) C 22(52) C . . . C Tjv(sjv). 

The sets Qk, To, Tk{s), k — 1,...,A for each 5 are convex, and the 
functions <^^( 0 , 6 , 5 ) are concave in (a, 6 ) G Tk{s) for every s G Sk, k = 
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0,1,..., A^, in addition s^^i) > (f^{a,b,Sk) for each pair (a, 6) G 

Tk{sk). The sets Qk are assumed bounded and contains zero point (0,0). 

Definition. We say that the sequences of nonnegative vector functions 

= a^(euSu...,9k,Sk), 9„<t<0k+i, 

with values in and respectively, the sequence of nonnegative scalar 
functions = h^{9i,si, . . .9k,Sk) support the transition program Z (of the 
form (4)-(6)), if the following conditions are satisfied: 

A. for each 0 < k < N,9k <t < 9k-\-i 



where 



argmax [v?''(a, 6, Sfe) + - V'fa], 

(a,b)£Tk(sk) 



W4-1 = ~i ^ r~k\ / n+i P(dsk+i), 

+ l-TTfe + iyf ^ l-TTfc + 12/f J5 



2/t = E 

j=tk 



= V’?(af+4) 



B- (c?,4+i) = arg max [a'^^^d - rp^c], 
(c,d)eQk 

C. The prices satisfy the relationship 



9k <t < 9k+i 



a 



k 

t 



_ fc i-7Tfc+i(t/h , ,k <+iiyt) 

- 7rfe+i(t/J'J * 1 - 7r*+i(j/*_i) 



1 



9k <i < 9k+i, 

where is the value of functional on the program Z under the condition 

that the structure {(fk-\-uTk^i) emerges at the moment t, i.e. 



N Oj+i 

W,^+^ = E[Y, E^( j H+1^ si)\9^^tust, = su...,9k=h, 

j=k+ii=dj 

^tk ~ ^k^ 9k-\-l — ^]. 



Theorem 2. Let the transition program Z of the form (4)~(6) be optimal 
in problem (7) and let the following additional condition holds: 



( 2 /) denotes a vector of partial derivatives (gradient) of the function 7Tk^i(y) 
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there are technological processes (a^(s/e), 6^(s/c)) G Tk{sk), such that 
> a^{sk), 9k <t< 9k^i, 0 < fc < 

Then there exist prices supporting the transition program Z. 

Proof of this Theorem is based on the version of Stochastic Maximum Prin- 
ciple (Theorem 1) stated in Section 2. 

Acknowledgement. This work is partially supported by Russian Foun- 
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Abstract. Various aspect of the robust optimization approach [11] are dis- 
cussed in the context of scenario bctsed stochastic linear programs. The main 
items are the choice of the model parameter, which can be related to proper- 
ties of nonlinearly perturbed linear programs [10] or of parametric quadratic 
programs [1], and an extension of the first results on the robustness of the 
optimal value with respect to probabilities of the selected scenarios and with 
respect to out-of-sample scenarios, cf. [5]. 

Keywords. Robust optimization, tracking model, scenario based stochastic 
programs, postoptimality, contamination technique, out-of-sample scenarios 



1 Introduction 

Let us consider various approaches to mathematical formulation of decisions 
problems under stochastic uncertainty about the future values of the system 
parameters. We assume that the initial available information consists of a 
finite number of possible batches of these parameters, called scenarios and 
that they are complemented by probabilities of their outcome. A frequent 
requirement is to decide before realization of one of these scenarios is known; 
there is an option to update this decision and/or to recompute the cost of 
the total decision procedure after the information about which scenario oc- 
cures is revealed. Given the initial information, the goal is to get the best 
possible scenario-independent initial decision. The decision criterion can be, 
for instance, the lowest expected cost, the lowest variability of costs under 
individual scenarios, etc. Its choice depends on the problem to be solved. 

Example 1. Scenario-based two-stage stochastic linear program (SLP) with 
random relatively complete recourse can be written in the form: 

Minimize 



5 

( 1 ) 

5 = 1 



Supported by the Grant Agency of the Czech Republic under grants No. 201/96/0230 
and 402/96/0420 
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( 2 ) 



subject to 
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Ax 

Tix-fWiyi 

T 2 X 4 - W2Y2 

T5X+ ... + W5y5 = hs 

X > 0,y5 > 0,s= 1,...,5 

where cj, = [q,, T,, W,, h,], 5 = are scenarios and p, > 0,s = 

1, . . . , 5 are their probabilities, YlsPs = 1- The first-stage decision x is sce- 
nario independent and, for each of considered scenarios, second-stage deci- 
sions Ys are introduced to maintain the constraints for the minimal additional 
cost. The assumption of the relatively complete recourse means that the set 
of feasible solutions (2) is nonempty. 

The problem (1) - (2) can be rewritten as 
minimize 



= b 

= hi 

= h2 



(3) + 
on the set 

(4) ^ = {x| Ax = b, X > 0} 



where 



(5) 9(x,Wj) = min {qJysIWjy, = hj - T,x, y* > O} 

y s 

For this type of problem, the criterion is the minimal expected cost of the 
two-stage decision process. 

Example 2. The objective function (3) from Example 1 can be modified to 
minimize 



^5 

(6) -^^^^p,w(c'^x+g(x,w,)) 

on the set (4) and with notation (5). This criterion follows the principle of 
maximal expected utility and u is assumed to be a concave nondecreasing 
utility function. 
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Example 3 . A robust optimization model (RO) (cf. [11]) which corresponds 
to ( 1 ), ( 2 ) is: 

Minimize 

p) 

subject to 

( 8 ) Ax =b 

TjX+W,y, =h, 

qjy, -6=0 






X > 0 ,yj > 0 ,s = 1 , . . . ,5 

The newly introduced variables 6 ^re equal to the cost of the decision x 
plus the corresponding cost of its compensation or of the recourse activity y, 
if the scenario Wj occurs. The additional term in the objective function equals 
the variance of the random costs ^ and its weight in the objective function 
is expressed via a scalar parameter A > 0. The objective function (7) can be 
related to a bicriteria optimization problem where the first criterion coincides 
with minimization of the objective function used in Example 1. Hence, the 
eflBcient solutions which are obtained by solving RO for different values of A 
can be also computed as follows: 

Minimize 



( 9 ) 

subject to ( 8 ) and 



with an appropriately chosen parameter value 7 . Moreover, similarly as in [9], 
the optimal solution of this program can be regarded as an approximation of 
the optimal solution for the (minus) expected utility criterion applied to the 
costs 6 • The parameter 7 identifies the point about which the approximation 
of the utility function by its Taylor expansion is used. 

Variance of the random costs ^ is only one of possible choices of additional 
criteria suggested in [11]; see also Example 7. 



Example 4. The mean - variance model (M-V) for the minimal recourse 
costs g(x,Wj) can be written as 
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minimize 

(10) c'^x + P,q{x,uj,) + xj2^^^ps[q{x,u,) - 

on the set (4) and with the minimal recourse costs g(x,u;,) defined by (5). 
In contrast with the robust optimization problem, this criterion works with 
the minimal recourse costs q{x,ujs) and it deals with two conflicting criteria 
- the minimal total expected costs and the minimal variability of these costs. 

Example 5. The tracking model (see [2]) related to (1) - (2) can be formu- 
lated as follows: Let Vg,s = 1, . . . , 5 be the optimal values of the individual 
scenario problems 
minimize 

(11) c^'x + qjy, 
subject to 

(12) Ax = b 

TjX+Wjy, = h, 

X > 0,y, > 0. 

Then the basic compromising or tracking model is 
minimize 

(13) IZLiP" (l|c^x + qjy* -t^s|| + ||T,x + W,y, -h,||) 
subject to 



Ax = b 

X > 0,Y5 > 0,s= 

The first and second stage solutions obtained by solving this problem track 
the optimal solutions of the individual scenario problems (11), (12) as closely 
as possible. The norm in (13) can be in principle chosen in an arbitrary way; 
its choice influences the solution procedure. 

Example 6. SLP with restricted recourse [12] aims at limitation of the dis- 
persion of the recourse decisions. According to the principles of multicriteria 
optimization, the objective function (1) should be extended to 

with a nonnegative parameter A or the constraints (2) extended for an addi- 
tional constraint 

(14) II ^ ^ 

where e is a chosen tolerance level. Models of this type capture features of 
both robust optimization and tracking model. 
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Example 7. Another possibility is to replace the objective function in the 
RO problem (7) - (8) by a general performance function of the decision 
variables x, y,, s = 1 , . . . , 5 and in addition, to relax the constraints; cf. [11]. 
The resulting model can be formulated as follows: 

Minimize 



<^(x, yi , . . . , ys) + /^p(ui , . . . U5) 



subject to 

(2’) Ax 

TsX+WsYs -f-U5=h, 

X > 0,y5 > 0,s= 1,...,5 

The first term in the objective function corresponds to (7), the second one 
with a parameter /i > 0 penalizes possible violations of the inital second-stage 
constraints T5X -h W^y^ = I15. 

From the modeling point of view the choice among the introduced models 
depends on the nature of the real life problem to be solved, on the numerical 
tractebility of the resulting optimization problem and also on properties such 
as the sensitivity of the solution on the input data, i. e., on the choice of 
scenarios and their probabilities, on the errors in scenarios, etc. In this pa- 
per, we shall concentrate on properties of the optimal solution of the robust 
optimization problem (7), (8) in comparison with those for stochastic linear 
program (1) - (2) and for the corresponding mean- variance model; in termi- 
nology of [10] it means that our focus is exclusively on solution robustness 
but not on model robustness. In the next Section we shall give a comparison 
of optimal solutions of the three mentioned models on a simple example; we 
shall see that the differences between optimal solutions appear only for suffi- 
ciently large values of parameter A. A detailed analysis of this phenomena in 
the general case is the main subject of Section 3. Section 4 is devoted to the 
resistance of the optimal value of the RO problem with respect to inclusion 
of additional scenarios; using the contamination technique we shall derive 
global bounds for the optimal value of RO under assumptions comparable 
with those for SLP. The numerical illustration of the obtained results comes 
from [3]. 



2 An Illustrative Example: The Newsboy Problem 

The well known newsboy problem can be stated as follows: 

A newsboy sells newspapers for the cost c each. Before he starts selling, 
he has to buy the daily supply at the cost 6 a paper, c > 6 > 0. The demand 
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is random and the unsold newspapers are returned without refund at the end 
of the day. How many newspapers should he buy? 

In the framework of scenario based stochastic decision models introduced 
in Section 1, one assumes that the demand is random with a known discrete 
distribution concentrated at 0 < cji < CJ2 < • • • < CJ5 with probabilities 
Ps > 0, s = 1, . . . , 5, = 1 and we get: 

2.1 The SLP formulation 

(15) min[(6-c)x + c> p,(x-w,)+] 



which can be also written as 



(16) 

subject to 
(17) 



min f(6-c)x + c}^ p,ys] 

3,ys>OV5 -^—'5 = 1 

X- y, < w,,s= 1,...,5 



2.2 The M-V formulation 

minimize 



( 18 ) 

(6-c)x + c^^^^^P,(x-w,)+ + Ac2 



subject to a: > 0. 

2.3 The RO formulation 
(19) 



mm 

X>0,t/s>0V5 



|(6 - c)x + c p,Vs + Ac^ p,[y, - I 



subject to (17). 



For to be able to understand the differences, we shall solve these three 
problems assuming that there are only two extremal scenarios of demand, 
0 < cji < 0)2 with probabilities Pi,P2 = 1 ~ Pi- 

For the SLP formulation 2.1 it is enough to evaluate and to compare the 
values of the objective function (15) at the points x = o;i, x = a;2 what gives 
(6 - c)ui <0, (6 - c)a;2 + Pic(o;2 ~ ^^i), respectively. The optimal decision 

is 

X5LP = CJi if 6— p2^>0 

XSLP—^2 if b — P 2 C < 0 
a?5LP € [t^;i,u;2] if b = p2C 
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Similarly for the M-V formulation 2.2, the objective function (18) attains 
its minimum in the interval [(Ji,cj 2 ]; this gives three possibilities 

(20) XMV—^i if 6 — p2C>0 

P2C — h 

6-P2c<o 

J A A P2C - h 

and A > Ao := 7 r 

2c^PiP2(^2 ~^i) 

xmv — ^2 if 6 — P 2 C < 0 and A < Aq 

Hence, for small values of X, the optimal decision xmv is optimal for SLP, 
too. 

The analysis of the RO formulation 2.3 is more complicated. Consider 
first the response of the model on an a priori chosen first-stage decision x 
which is given by minimization of 

(21) c(piyi+p 2 j/ 2 ) + Ac^ {pi[yi - {PlVl + P 2 y 2 )? + P 2 [y 2 - (PlJ /1 +P 2 y 2 )]^} 
subject to 

(22) yj >(£ -w,)+ := a,(x),s = 1,2 
The objective function (21) can be further rearranged to 

(23) min c(piyi + p 2 P 2 ) + AcViP 2 [ 2 /i - y 2 ? 

The optimal solution is attained at the boundary of the set (22). It equals 
05 (x), s = 1, 2 for A small enough, 

(24) A < A*(x) [2cpi(ai(x) - a 2 (x))]“^ 

or for X < LUi (i. e., for ai(x) = (x — cji)"*" == 0). Moreover, for all x > 
uji, A*(x) > Aq. It means that for A < Aq the RO model reduces to the 
M- V formulation so that their optimal decision x is identical and is optimal 
also for the SLP model (Compare with results of Section 3.2.) 

For X < uji, the optimal value in (23) equals zero, so that the overall 
objective function is (6 — c)x and its minimum is attained at x = cji. 

For X > cji and for A > A*(x), yi = ai(x) = (x — whereas z/2 = 
ai(x) — ^ ^ 2 (^) • This means that for an already selected x > cji 

and for large values of A, the penalty for purchasing x can be calculated 
higher than necessary. Moreover, for A — ^ -hoo we have p 2 — ^ cti(2^), i- e., 
yi = 2/2 = (a; - wi)+. 

For a fixed sufficiently large value of A and for x > cji , the RO optimization 
problem is 

min < (6 - c)x -f- c(x - cji) - > 

x>u;i ( 4Api J 
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The objective function increases in x and the optimal decision is not attained. 
However, for x — >■ cji, ”large” value of A means A > A’^(x) where A’*'(x) — ► 
-foo, so that ^ 0 and the infimum equals (6 — c)ui. 

Hence, the partial results valid for x < lji and for large A with x > ui 
imply that for A — ► oo, the optimal decisions xro — ► xmv 

independently of probabilities pi,P2- (Compare with optimal solution for 
SLP and with results of Section 3.3.) 



3 Reflections on Robust Optimization 

3.1 The basic properties 

The model introduced in Example 3 
minimize 



subject to 



( 8 ) Ax =b 

T,x-fW,y, =h, 
c"^x+ qjy, -^J =0 
X > 0 ,ys > 0,5 = 1 , . . .,5 

is a quadratic convex program in variables z = [x,y 5 ,^ 5 ,s = 1,...,5] and 
with a nonnegative scalar parameter A > 0. Let Z ^ 0 he the set of feasible 
solutions defined by ( 8 ). The objective function (7) can be written as 

(25) + 

where p is the 5- vector of components ps , ^ the 5 - vector of components ^5 
and 

(26) P = diag{p} - pp'^ 

is a singular matrix. Given scenarios 0 ^ 5 , s = 1,...,5, the optimal value 
depends on A and p; to emphasize this fact, it will be denoted by ^(A,p). 
Similarly, the set of optimal solutions of (7) - ( 8 ) will be denoted by 2* (A, p). 

The two-stage SLP (1) - (2) is equivalent to the robust optimization prob- 
lem (7) - (8) with the parameter value A = 0 : It can be written as 

(27) minimize ^ ps^s on the set (8) 

We assume that the set Z* := 2*(0, p) of its optimal solutions is nonempty. 
Its optimal value <^(0,p) is a lower bound for the value of the objective 
function (26) for all z € 2 and A > 0. Hence, it is easy to prove the following 
properties: 
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Proposition 3.1. Provided that there is an optimal solution of the RO 
problem for A = 0, the RO problem has an optimal solution for all A > 0. 

Proposition 3.2. Assume that Z* := 2*(0,p) is nonempty. Then the 
optimal value function ^(A,p) is a concave function of A e [0,oo). 

Proposition 3.3. Assume that Z* := 2*(0,p) is nonempty and bounded. 
Then there exists derivative of v?(A,p) at A = O"*" 

(28) = min ^"^P^ > 0 
(an application of [8, Theorem 17]). 

Notice that the differentiability property 3.3 can be extended to all A, for 
which Z*(A,p) is nonempty and bounded. 

Proposition 3.4. Let be two different optimal solutions of (7) - (8). 

Then for their corresponding parts 

(29) p^r = p'^r* and = r*'^Pr* 

(see [1, Theor. A. 4.3]). 

3.2 The case of small A 

The robust optimization problem (7) - (8) with small A > 0 can be viewed 
as a perturbation of the SLP problem (1) - (2) or (27). At this moment we 
can exploit general results of [9]: 

Proposition 3.5. Let 2*(0,p) be nonempty and bounded. Then 

(i) there exists A > 0 and x,ys,CjS=: 1,...,5 which is an optimal 
solution of the two-stage SLP (27) and of the robust optimization problem 
(7), (8) for all A G [^0, A . 

(ii) For A small enough, Z*(A,p) C Z*(0,p). 

(iii) If z = [x, y^,5 = 1,...,5, $] is an optimal solution of the convex 
quadratic program 

(30) minimize $^P^ on the set 2*(0,p) 

then z is an optimal solution of the two-stage SLP (27) and of the robust 
optimization problem (7) - (8) for A > 0 small enough. 

Proof is a straightforward application of Theorem 1 of [10] in the case of 
statement (i) and of Theorems 3 and 4 ibid in the case of the remaining two 
statements. 

In the sequel we shall assume that the set of optimal solutions of the two- 
stage SLP (1) - (2) or the set 2’*'(0,p) is nonempty. Under this assumption 
and for small values of A, the robust optimization problem (7) - (8) can 
be thus viewed as a regulatization of the two-stage SLP (27). Quite similar 
results hold true also for the M-V problem (10). 

The above statements imply that differences between SLP and RO or M-V 
can come into force only for sufficiently large values of the parameter A. 
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3.3 The case of large A 

For large values of A, it is convenient to study instead of (25), (8) the following 
parametric convex quadratic program 

(31) min + 1/2^'^P^} 

with a nonnegative parameter a — (2A)"'^ in the linear part of the objective 
function. Stability with respect to changes of parameter a follows from more 
general results by [1, Chapter 5.4 and 5.5] according to which the solubility set 
A of (31) can be decomposed into finitely many convex stability sets - open 
intervals and isolated points. Proposition 3.1 implies that the considered set of 
parameters (0, oo) belongs into the solubility set A of program (31); it means 
that there is oi > 0 such that all parameter values a G (0, ai) belong into 
the same stability set characterized by fixed indices of positive components of 
optimal solutions x(a),yg(a), s = 1, . . .,5 and that these optimal soultions 
are linear functions of a on (0, oi). 

The interpretation of this result for the RO problem (7) - (8) explains the 
robustness property valid for large values A: 

Proposition 3. 6. There exists Ai > 0 such that the optimal first-stage 
solutions x(A, p) and the optimal compensations ys(A, p) Vs in the RO pro- 
blem (7) - (8) keep fixed indices of zero components for all A > Ai. 

This was the case of yi = t /2 = 0 for all large values of A in the RO 
problem in Section 2. Notice that except for evaluation of Ai, the result holds 
true for arbitrary values of probabilities Ps,s = 1, . . . , 5. 

4 Resistance with respect to Selected Scenarios and their 
Probabilities 



Up to now we have studied properties of the robust optimization problems in 
dependence on the parameter A, assuming that the scenarios a;,, s = 1, . . . , 5 
and their probabilities Ps have been fixed in advance. Now we shall turn our 
attention to the question of resistance of the results with respect to the men- 
tioned input. This problem was opened in [5]. We shall continue in developing 
further the approach of [5] which is based on the contamination technique [4] . 
We refer to [6] for a similar treatment of the tracking model (Example 5) and 
for numerical experience related to an expected utility model (Example 2). 

We shall substitute c^x + qjys for into the objective function (7) 
and we shall apply the contamination technique to the problem 
minimize 



H 2 



(32) 



[c'^x + qjys] + 



qsVs-X]pj<i7yj 



5 = 1 



5 = 1 



j = l 
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subject to 

(33) Ax b 

T^x+W^y^ = 

X > 0 ,y 5 > 0,s= 

with fixed scenarios Us = [T^ , W^, h 5 , qs], s = and with a fixed 

parameter value A. We denote by Z the corresponding canonical projection 
of Z - the fixed set of feasible solutions z [x, y 5 , Vs] described by (33), we 
assume that the set Z*{X,p) of optimal solutions of (32), (33) is nonempty 
and bounded and we denote by <^(A,p) the optimal value of (32), (33). The 
objective function /(z; A,p) defined by (32) is concave in probabilities p and 
convex with respect to all considered variables z = [x, ys, Vsj. It can be again 
written in the form (25) with = c"^x 4- qjys or rewritten as 

(34) 

/(z; A, p) = c'^x + ~ Ylj pj^J 

= c"^x + Epq^y + Avar^q'^y 

Concerning the structure of the set of feasible solutions Z of (33), we shall 
assume that the sets of feasible compensations 

(35) ys{x) = {y,|W,y, = - T^x,y5 > 0} 

are nonempty for all s and for any x G 

(36) X = {x| Ax = b, X > 0} 

i.e., assumption of the relatively complete recourse. 

For a fixed set of scenarios u;^, s — 1, . . . , 5, the contamination by another 
distribution carried by these scenarios means simply a change of the original 
probabilities ps,s = 1 , . . . , 5 to (t) = ( 1 — t)ps + t7Ts,s = 1,...,5 with 

denote probabilities of the given scenarios under 

the alternative distribution. 

For fixed probabilities p, tt, the objective function /(z; A, (1— ^)p + 7r) that 
corresponds to contamination of p by tt results from (34) by replacing ps by 
{l—t)ps +t7Ts. It depends on a scalar parameter / E [0, 1] and we shall denote 
it briefly f\{z,t). It is evidently a concave quadratic function of t. We shall 
assume that the optimal value <p(A, 1) is finite, i.e., that the problem has an 
optimal solution when probabilities ps,s = 1, . . . , S' are replaced in (34) by 
the alternative probabilities tt^, s = 1, . . . , S. 

At any feasible point z E 2, the derivative 

(37) ^/a(z,<) = /a(z, 1) - /a(z, 0) + A(1 - 2t) ~ y>) 
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and the derivative of the optimal value function (p{\, (1 - <)P + 
contaminated problem with respect to t at the point t = 0 "^ follows from 
a standard result of parametric programming; see e.g. [ 8 , Theor. 17] or 
discussions in [3]-[5]: 



(38) — v?(A,0+)= min 

di i£Z*{x,p) 



fx{i, 1) + A y*) 



-VP(A,P) 



This derivative provides an information about the response of the opti- 
mal value function on small changes in probabilities and it can be used in 
construction of global bounds on the optimal value (1 — ^)p -i- tir) of the 
contaminated problem for all t E [ 0 , 1 ]: 

(39) (l-t)¥>(A,p) + <v?(A,7r) < ¥?(A, (1 -t)p + <-»r) < ^(A,p) + t^V’(-^,0'*') 



whose analysis implies 

Proposition 4.1. Let A > 0 be fixed, the set 2*(A,p) of optimal solutions 
of (32) - (33) be nonempty and bounded and the set Z*(A, tt) 91 ^ 0. Then for 
an arbitrary < 6 [ 0 , 1 ] 

|V?(A, (1 - t)p + tir) - v?(A, p)| < t max | |v?(A, ir) - ip{X, p)|, 0'^)l| 



Moreover, 



P) ” ^)P 

-iy)(A, 0 +) < 0 y>(A, (1 - t)p + tir) < >p{X, p) Vt 

at 

If the set of optimal solutions of the initial problem is a singleton, Z* (0, p) 
= {z*(0,p)}, evaluation of the derivative (38) does not require any minimiza- 
tion - an essential simplification that was exploited in [5]. However, some 
simplification can be obtained under less stringent assumptions: 

Proposition 4.2. Let the assumptions of Proposition 4.1 hold true and 
assume in addition that all probabilities p, in the original problem (32) - 
( 33 ) are positive and that for all optimal solutions of (32) - (33), the x-part 
is fixed, denoted by x*(p). Then the derivative 

(40) ^<^(A,0+) = c‘^x*(p)-^(A,p) + ^^ir,qJy;(x*(p)) 

+ A 7 T 5 [qJy;(x*(p)) - TjqJ y,- (x*(p))]^ 

+ A y5(x*(p))) 
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where y*(x*(p)) is an arbitrary optimal compensation of x’^(p) under sce- 
nario s. 

Proof follows from Proposition 3.4: for a unique x-part of the optimal 
solution of (32) - (33), both the mean value of costs and the variance of 
these costs are fixed for all optimal compensations y*(x*(p)) of x*(p)). This 
implies that for all scenarios that enter with a positive probability ps the 
costs qjy*(x*(p)) must be fixed for all optimal compensations. Hence no 
minimization in (38) is needed and any of optimal compensations can be used 
to evaluate the derivative. □ 

Proposition 4.3. Under assumptions of Proposition 4.2 (or in the case when 
Z*(A,p) is a singleton), the derivative ^V^(A,0"^) is linear tt. 

Proo/ follows by rearranging (40): 

^¥>(•^,0'^) = c'^x*(p) - 9 ?(A,p) + 7T,q7y;(x*(p)) 

+ x,(qjy;(x*(p)))^ + A (^^p,qjy,*(x*(p))) 

- 2A y. (x*(p)) Y, y» (**(p)) ° 



Also sensitivity results with respect to additionally included scenarios can 
be treated as sensitivity with respect to probabilities: We assume again 
that the problem (32) - (33) has been solved for scenarios a;,, s = 1, . . . , 5 
and probabilities ps > OVs. Inclusion of other scenarios, say, Ug,s = 5 + 
1, . . . , 5 + 5' means an essential extension of the solved problem. To quan- 
tify the local response on these changes of the input data and to get global 
bounds on the optimal value of the extended problem we shall start with 
the problem based on the pooled sample cji, ... ,cj 5 ,a; 5 ^.i, ... ,u; 54 . 5 /. The 
initial problem is identified by probabilities ps > 0,s = l,...,5,p, = = 

S + I, . . . ,S + S' ,Y1Ps = 1 contaminating problem by probabilities 
7Ts = 0, s = 1, . . . , 5, TT, > 0, s = 5 -b 1, . . . , 5 -f 5', ^ TT, = 1. The formally 
extended initial problem (32) - (33) is 
minimize 

subject to 

(41) Ax = b 

T,x+W,y, = h, 

X > 0,y, > 0,s = 1, . . .,5 + 5' 
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and the formula (38) applies under standard assumptions (the set of op- 
timal solutions of (32) - ( 33 ) nonempty and bounded and the existence 
of an optimal solution for the problem based on the additional scenarios 
W 5 + 1 , . . -,^ 5 + 5 / and their probabilities X 5 + 1 , . . .,i^s+s')- To simplify (38), 
assume again that the x-part of the optimal solution of (32) - (33), resp. 
(41) is unique and equals x*(p). Similarly as in Proposition 4.2, the costs 
of optimal compensations are constant for all original scenarios s = I, ... ,S 
and the minimization in (38) will be carried only over optimal compensations 
of x*(p) for scenarios W 5 + 1 , . . .,us+s' that enter (32), (41) with zero prob- 
abilities. The sets of these optimal compensations coincide with the sets of 
all possible compensations J^,(x*(p)), s = 5 -I- 1, . . . , 5 -|- 5'. The derivative 



(42) = c'^x*(p) - <^(A,p) -h A (^^^^p.qjy:(x*(p))) 

-I- min , 7r5+s (qJ+,ys+j + A(q5^.,y5+s) 

ys+.eys+.(x*(p)).»=i.'5"^»=i 

-2AqJ+,y5+5 X^^^^p»qJyr(x*(p))) 

= c-^x-(p) + A p,qj y:(.-(p)))’ + EL 

r ^ 

qj+,y 5 +. + A(qJ+,ys+.)' -2AqJ+,y5+, p.qjy:(x*(p)) 



- V5(A,p) 



If the contaminating distribution is a degenerated one that assigns prob- 
ability 1 to one new scenario w. for which the corresponding single scenario 
problem is solvable, (42) reduces to the optimal value of the simple quadratic 
program introduced in [5]; 



(43) ^V^(A, 0 +) = c'^x*(p)-<p(A,p) 

-b min < qjy» + A 

y.ey.(x*(p)) 



-| 2 ' 



qjy. ~y]p5qj’y.*(x*(p)) 



« = 1 



On the other hand, (42) can be written as an average (with weights 
xs+j, s = 1 , . . . , 5') of expressions (43) computed for each of included out- 
of-sample scenarios W 5 +S , s = I, ... ,S' separately. 

For separability of the derivative (42) with respect to additional scenarios, 
assumption of a unique first stage solution x*(p) of the original problem is 
essential. For multiple first stage solutions, (42) computed at any of these 
solutions is an upper bound for the derivative ^^(A,0+), so that (39) with 
(42) at the place of ^V’(A, 0 +) are still valid bounds but they are less tight. 




125 



5 Conclusions 

Our analysis implies that a genuine difference between optimal solutions of 
the two-stage SLP and of the robust optimization problem appears only for A 
sufficiently large, say, for A > Aq, Aq > 0. In the limit case of A oo, say, 
for A > Ai, the indices of zero components of the optimal first- and second- 
stage solutions of the considered RO problem do not change any more. The 
numerical values of Aq, Ai depend on the input data. 

Postoptimality and sensitivity results for robust optimization problems 
can be obtained by an extension of the contamination technique as suggested 
in [5]. The first numerical studies [3] indicate that the upper bound for the 
optimal value based on the pooled sample is quite precise; see Table. The 
results concern a modified metal melting problem from [7] with 64 equiprob- 
able scenarios of the input composition (elements of the technology matrices 
Ts) and with one additional scenario. The value of t has been fixed to 1/65. 

The original results of [5] were based on the assumption of unique first- 
and second-stage optimal solutions of the initial RO problem. This paper 
demonstrates that, similarly as for the two-stage SLP, the assumption can be 
relaxed to uniqueness of the first-stage solutions. In this case, the desired ad- 
ditional information about resistance with respect to additional scenarios or 
with respect to changes of probabilities follows by evaluation (or estimation) 
of the optimal value based on the alternative/contaminating distribution and 
on solution of simple quadratic programs (43) for all new scenarios. 

Acknowledgement. This paper has benefited from discussions with Stavros 
Zenios (University of Cyprus). 
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On Constrained Discontinuous Optimization 



Yuri Ermoliev^ and Vladimir Norkin^ 

^ International Institute for Applied System Analysis, A-2361 Laxenburg, 
Austria 

^ Glushkov Institute of Cybernetics, 252207 Kiev, Ukraine 



Abstract. In this paper we extend the results of Ermoliev, Norkin 
and Wets [8] and Ermoliev and Norkin [7] to the case of constrained 
discontinuous optimization problems. General optimality conditions for 
problems with nonconvex fecisible sets are obtained. Easily implement- 
able random search technique is proposed. 

Key words. Discontinuous systems, necessary optimality conditions, 
averaged functions, mollifier subgradients, stochastic optimization 

1 Introduction 

In this paper we elaborate further results of Ermoliev, Norkin and 
Wets [8] and Ermoliev and Norkin [7] to a general constrained dis- 
continuous optimization problem: 

minimize F{x) (1) 

subject to X G K C (2) 

where F{x) is a (strongly) lower semicontinuous function, K is a com- 
pact set. 

As we showed in [7] the class of strongly lower semicontinuous func- 
tions is appropriate for modeling and optimization of abruptly chan- 
ging systems without instantaneous jumps and returns. In particular, 
we analyzed risk control problems, optimization of stochastic networks 
and discrete event systems, screening irreversible changes and stochastic 
pollution control. Another important application may be stochastic 
jumping processes describing risk reserves of interdependent insurance 
and reinsurance companies. In a rather general form the risk reserves 
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can be understood as ’’reservoirs”, where risk premiums are continu- 
ously flowing in and random claims at random time moments abruptly 
draining them out. A sample path of such process is a strongly lower 
semicontinuous function with random jumps at claim occurrence times. 

In a sense the main aim of this article is to provide proofs of op- 
timality conditions for general discontinuous constrained optimization 
problems discussed in [7]. In Section 2 we analyze situations when the 
expectation function belongs to the class of strongly lower semicontinu- 
ous functions. General idea of discontinuous optimization is presented 
in Section 3. Optimality conditions for discontinuous functions and gen- 
eral constraints are analysed in Section 4. Section 5 outlines possible 
computational procedures. 

2 Some classes of discontinuous functions 

In nonsmooth analysis different classes of continuous functions are in- 
troduced and studied. The same is necessary for discontinuous func- 
tions. We basically restrict possible discontinuity to the case of strongly 
lower semicontinuous functions, which seem to be most important for 
applications. 

Definition 2.1 A function F : is called strongly lower 

semicontinuous at x, if it is lower semicontinuous at x and there ex- 
ists a sequence x^ — > x with F continuous at x^ (for all k) such that 
F{x^) — > The function F is called strongly lower semicontinu- 

ous (strongly Isc) on X Q R^ if this holds for all x £ X . 

Definition 2.2 Lower semicontinuous function F : R^ — > R^ is 

called directionally continuous at x if there exists an open (direction) set 
D{x) containing sequences x^ G D{x)^ x^ — > x such that F{x^) — > 
F{x). Function F{x) is called directionally continuous if this holds for 
any x G R^- 

Definition 2.3 Function F{x) is called piecewise continuous if for any 
open set A C R^ there is another open set B C A on which F{x) is 
continuous. 

Proposition 2.1 If function F{x) is piecewise continuous and direc- 
tionally continuous then it is strongly lower semicontinuous. 
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Proof. By definition of piecewise continuity for any open vicinity 
V (x) of X we can find an open set B C D(x) flV (x) on which function 
F is continuous. Hence there exists sequence x^ G D(x), x^ — > x with 
F continuous at x^. By definition of directional continuity F(x^) — > 
F(x).0 

Properties of directional continuity, peicewise continuity and strong 
lower semicontinuity can be easily verified for one dimensional func- 
tions. For instance, if one dimensional function F(x), x G i2, is (i) 
lower semicontinuous, (ii) continuous almost everywhere in R and (iii) 
at each point of discontinuity x E R function F(x) is continuous either 
from the left or from the right, then F{x) is strongly Isc. Next propos- 
ition clarifies the structure of multidimensional discontinuous functions 
of interest. 

Proposition 2.2 If F{x) = Fq{Fi{xi)^ . . Fjn{xm))^ where x = (xi, 

. . . , Xm)f Xi G R^% function Fq{-) is continuous and functions (x^), i = 
l,...,n are strongly Isc (directionally continuous)^ then the compos- 
ite function F{x) is also strongly Isc (directionally continuous). If 
F{x) = Fo(Fi(x), . . ., F^(x)), X G R^^ where Fo(-) is continuous and 
F^(x), i = 1, . . .,m, are piecewise continuous, then F(x) is also piece- 
wise continuous. 

In particular, strong Isc, directional continuity and piecewise con- 
tinuity are preserved under continuous transformations. 

Proof is evident. 

The next proposition gives a sufficient condition for a mathematical 
expectation function F(x) = E/(x,u;) to be strongly lower semicon- 
tinuous. 

Proposition 2.3 Assume function /(-,u;) is locally bounded around x 
by an integrable (in u) function, piecewise continuous around x and 
a.s. directionally continuous at x with direction set D[x,u) = F(x) 
(independent of u). Suppose u takes only a finite or countable number 
of values. Then F(x) = E/(x,a;) is strongly Isc at x. 

Proof. Lower semicontinuity of F follows from Fatu lemma. The 
convergence of F(x^) to F(x) for x^ — > x, x^ G F(x) follows from 
Lebesgue’s dominant convergence theorem. Hence F is directionally 
continuous at x in D{x). It remains to show that in any open set A C R^ 
which is close to x there are points of continuity of F. For the case when 
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u) takes finite number of values u;i , . . . , with probabilities Pi , . . . , Pm 

the function F(-) = Pifi'i^i) is clearly piece-wise continuous. For 
the case when u takes a countable number of values there is a sequence 
of closed balls Bi C c A convergent to some point y £ A with 
continuous on Bi. We shall show that F(-) = 
is continuous at y. By assumption \f{x^uji)\ < C{ for x £ A and 
Hi^iPiCi < + 00 . Then 

F{x)-F(y) = ESiPi(/(x,‘Xi)-/(a,‘^i)) 

= YT=iPi(f{x,‘^i) - f{y,^i)) + K{x,y), 

oo 

|<5m(a;,y)|< 2piCiX,yeA. 

2=771 4-1 

Thus for any x^ — > y 

oo 

limsupF(x*^) < F{y) + 

^ 2=771 + 1 

OO 

liminfF(x^)>F(y)- V] 2piQ. 

i=m+l 

Since XlSm+i ^PiCi — > 0 as m — >• oo then limfc F(x^) = F(y).0 
Let us remark that functions of the form f(x,u;) = f{x — u;), x^u £ 
with /(•) piecewise and directionally continuous have D{x) inde- 
pendent of (jJ. 

Propositions 2. 1-2.3 provide a certain calculous for strongly Isc func- 
tions. 

3 Averaged functions and mollifier subgradi- 
ents 

In order to optimize discontinuous functions we approximate them by 
so-called averaged functions which are often considered in optimization 
theory (see Antonov and Katkovnik [1], Katkovnik and Kulchitsky [13], 
Archetti and Betro [2], Warga [21], Katkovnik [12], Gupal [9], [10], 
Gupal and Norkin [11], Rubinstein [20], Batuhtin and Maiboroda [4], 
Mayne and Polak [15], Mikhalevich, Gupal and Norkin [16], Ermoliev 
and Gaivoronski [6], Kreimer and Rubinstein [14], Batuhtin [3], Er- 
moliev, Norkin and Wets [8]). The convolution of a discontinuous func- 
tion with appropriate mollifier (probability density function) improves 
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continuity and differentiability, but on the other hand increases com- 
putational complexity of resulting problems since it transfers a determ- 
inistic function F{x) into an expectation function defined as multiple 
integral. Therefore, this operation is meaningful only in combination 
with appropriate stochastic optimization techniques. Our purpose is to 
introduce such technique and to develop a certain subdifferential calcu- 
lous for discontinuous functions. Let us introduce necessary notions and 
facts which are generalized in the next section to the case of constrained 
problems. 

Definition 3.1 A set (family) of bounded integrable functions {'tpe : 
— > jR_^, 9 G jR-f.} satisfying conditions 

f ife{^)dz = 1, suppife •= ^ > 0} C 

JR^ 

with a unit ball B^ pe i 0 as 9 I 0^ is called a family of mollifiers, 
Mollifiers are called smooth if functions continuously 

differentiable. 

Given a locally integrable (discontinuous) function F : — > R^ 

and a family of mollifiers {'ife} the associated family {F^, 9 G R+} of 
averaged functions is defined by 

F^{x):= f F{x - z)ipg{z)dz = f F{z)'tpe{x - z)dz. (3) 

Jfi" JR’> 

Mollifiers may also have unbounded support (see [8]). 

Example 3.1 Assume F{x) = E^f{x,uj). If f{x,uj) is such that 
E.|/(x , u;)| exists and grows in the infinity not faster than some poly- 
nom of X and random vector g has standard normal distributionj then 
for ^g{x,i],oj) = l[f{x + er],u)) - f{x,u;)]r] or ^g{x,r],uj) = j§[f{x + 
0ri,ijj) - f{x - 0r],u})]r), 9 > 0, we have VF^{x) = 7y,w). The 

finite difference approximations ^g{x,T],u!) are unbiased estimates of 
VF^(x). As in [7], we can call them stochastic mollifier gradient of 
F{x). 

Definition 3.2 (See, for example, Rockafellar and Wets [17]). A se- 
quence of functions {F^ : F" — > R} epi-converges to F : — > R 

relative to X C RF if for any x ^ X 

(i) lim inf yt-).oo F^(x^) > F(x) for all x^ — > x, x^ € X ; 

(ii) limfe^-oo F'^{x^) — F{x) for some sequence x* — ¥ x, x^ £ X . 
The sequence {F^} epi-converges to F if this holds relative to X = 

RF. 
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For example, if 5 ; x — > R is (jointly) Isc at {x, y) and is 

continuous in y at y, then for any sequence — > y, the corresponding 

sequence of functions F^(-) = g{-,y'^) epi-converges to F(-) = y(-,y). 

The following important property of epi-convergent functions shows 
that constrained optimization of a discontinuous function F{x) can be 
in principle carried out through optimization of approximating epi- 
convergent functions F^(ai). 

Theorem 3.1 If sequence of functions {F^ : F" — >• R} epi-converges 
to F : RF — )■ R then for any compact K C F" 

lim(liminf(inf F^)) = lim(lim sup(inf F*)) = inf F, (4) 

e4-0 k Ke e|0 Ke A 

where K, = K + eB, B = {x € F"|||x|| < 1}. IfF*^{x’l) < infA-,F^ + 

6ki e Fe, 6k iO ask — v 00 , then 

limsup(limsupa;j) C argmin^^F, (5) 

e4-0 k 

where (limsupy^^e) denotes the set of cluster points of the sequence 
{x^} and (lim sup^^g denotes the set of cluster points of the family 
{X^^ e G F^. j as e j, 0. 

Proof. Note that (infft-^ F*^) monotonously increases (non decreases) 
as e 4 , 0, hence the same holds for lim inf^^oo inf a', F^ and lim sup;;,^^ 
infA'e F^. Thus limits over c j, 0 in (4) exist. 

Let us take arbitrary sequence i 0, indices and points x^ 
such that under fixed m 

lim inf (inf F*^) = lim (inf F*^^) lim F^^(x^). 

Thus 

lime4.o(limsupfc(infA:, F'')) > lime4.o(lim inffc(infA-, F^)) 

~ lim^— >-oo linis— foo F ”*(x,^) 

for some indices Sm . By property (i) of epi-convergence limOT -400 F’’^ i^t^) > 
inf A" F. Hence 

lim(limsup(inf F*)) > lim (lim inf (inf F^)) > infF. 

40 ' k R\ k " K 
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Let us proof the opposite inequality. Since F is lower semicontinu- 
ous, then F{x) = inf/^F for some x e K. By condition (ii) of epi- 
convergence there exists sequence x^ — >• x such that F*(x*^) — y F(x). 
For k sufficiently large € Fj, hence infA^^ pk ^ and 

lim(liminf(inf F*')) < lim(limsup(inf F*)) < F(x) = inf F. 
c4-0 k Kc £4-0 k 



The proof of (4) is completed. 

Now prove (5). Let x^ € Ke and F^{x^) < infAe F* + Sk i 0. 
Denote = limsupj.Xj C K^. Let -I 0, x^^ € and Xe^ — > 
X ^ K as m — > oo. By construction of for each fixed m there 
exist sequences x^ — y Xe„ satisfying F^^{xn^) < infA'^^ F^™ + 
4- 0 as s ^ OO- By property (i) 




Due to lower semicontinuity of F and (4) we obtain 

F(x) < liminfF(xe^) < lim inf(limsup( inf F^)) = inf F, 

m — >-oo £^4.0 K 

hence x € argminAF, that proves (5).D 

Remark that in Theorem 3.1 we could relax constraint set K in 
different ways, for instance, if F = {x € F"| G(x) < 0} with some 
lower semicontinuous function G(x), then we could define Fe = {x 6 
F"|G(x)<c}, £>0. 

Let us illustrate the result of Theorem 3.1 by the following example. 
Example 3.2 Consider a discontinuous optimization problem 



min 

®>o 



F(x) 



0, X < 0, 

1, X > 0 



( 6 ) 



Let F^{x) be a family of averaged functions for F associated with a 
family of mollifiers 'ipeiv) = ^ > 0, where function ^(*) is 

symmetric with respect to point y = 0. Obviously ^ functions F^{x) 
epi-convege to F and mivixyo F^ {x) = F^(0) = 1/2. If we don’t relax 
constraint set {a;| a: > 0} then optimization of approximate functions 
F^{x) over set {a;| x > 0} leads to a wrong result 



lim min F^(x) 

9-^0 a:>0 



1 

2 ‘ 
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The relaxation according to Theorem 3,1 leads to the true optimal value 
of the problem: 

lim min FUx) = 0 

and thus 

lim (lim min F^(x)) = 0 = min F(x), 

The following statement jointly with Theorem 3.1 shows that the 
averaged functions can be used for optimization of discontinuous func- 
tions. 

Theorem 3.2 (Ermoliev et al, [8]). For any strongly lower semicon- 
tinuous, locally integrable function F : any associated 

sequence of averaged functions 9k >1. 0} epi-converges to F, 

Jointly with Propositions 2.1, 2.3 Theorem 3.2 gives sufficient con- 
ditions for average functions to epi-converge to original discontinuous 
expectation function. 

A subdifferential calculous for nonsmooth and discontinuous func- 
tions can be developed on the basis of their mollifier approximations. 

Definition 3.3 Let function F : R^ — > R be locally integrable and 
{F^ := F^^} be a sequence of averaged functions generated from F by 
means of the sequence of mollifiers := : R^ — > R} where 

9k i 0 as k — y oo. Assume that the mollifiers are such that the 
averaged functions F^ are smooth (of class C^), The set of 'tp -mollifier 
subgradients (subdifferential) of F at x is by definition 

d^F{x) := lim sup{VF^(x^)| x^ — > x}, 

k 

i.e. d.^F{x) consists of the cluster points of all possible sequences 
{VF*(x*^)} such that x^ — y x. 

For example, for function (6) 5^F(0) = 5 > 0}. 

The subdifferential dipF{x) has the following properties (see Er- 
moliev, Norkin and Wets [8]): 

dtj,F{x) = dF{x) for a convex functions F{x); 
convex hull of d.^F{x) coincides with Clarke subdifferential dF{x) 
for a locally Lipschitzian function F{x); 

dx(;F{x) coincides with Warga subdifferential [21] dwF{x) for a con- 
tinuous function F{x). 
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Theorem 3.3 (Ermoliev et al [8]). Suppose that F : — > R is 

strongly lower semicontinuous and locally integrable. Then for any se- 
quence of smooth mollifiers, we have 0 G d^pF{x) whenever x is 

a local minimizer of F, 



4 Optimality conditions 



Theorem 3.3 can be used for constrained optimization problems if exact 
penalties are applicable. Unfortunately, this operation can practically 
remove some important minimums of the original problem. Consider 
problem (6). Here function F{x) is strongly Isc and point a: = 0 is a 
reasonable solution of the problem. We could replace this problem, for 
example, by the following one: 



min 

x>0 



F{x) 



F(x), ^ > 0, 

-X + 1, X < 0. 



The penalty function F{x) has single discontinuity point x = 0, where 
F achieves its global minimum F(0) = 0. Thus penalty functions may 
lead to isolated minimums, which are difficult to discover. 

Besides, we also encounter the following difficulties. Consider 

min{^| X > 0}. (7) 

In any reasonable definition of gradients the gradient of the function ^ 
at point X = 0 equals to +oo. Hence to formulate necessary optimality 
conditions for such problems and possibly involving discontinuities we 
need a special notion which incorporates infinite quantities. An appro- 
priate notion is a cosmic vector space R^ introduced by Rockafellar and 
Wets [18]. Denote F+ = {x G F| x > 0} and R^ = R^U {+oo}. 

Definition 4.1 Define a (cosmic) space R^ as a set of pairs x = (x, a), 
where x G R^, ||x|| = 1 and a G F+. All pairs of the form (x,0) are 
considered identical and are denoted as 0. 

A topology in the space R^ is defined by means of cosmically con- 
vergent sequences. 

Definition 4.2 Sequence (xk^Uk) G R^ is called (cosmically) conver- 
gent to an element (x,a) G R^ (denoted c-limk-^oQ[xk^Ok)) if either 
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linifcafc = a = 0 or there exist both limits € i?", lirrifcafc 6 i?” 

and X — liiriA: Xk, a = linifc afc ^ 0, i.e. 

(lim*;Xfc,limfcafc) if (limfc a^) < +oo, 

c-limk{xk,ak) = < (limfc +oo) if Ok — >• +oo, 

(limfc Xfc, +oo) if ak — + 00 . 

Denote 

c-Limsupfc(a;fc,a*:) = {(a:, a) e i?"| : (x,a) = c-limfc-^oola^U) 

For closed set K C i?” denote a tangent cone 

K — X 

Tk{x) = lim sup , 

T T 

to the set K at point x, normal cones 

Nk{x) = G R^\ < >< 0 for alia; G Tk{x)}^ 

Nk{x) = limsupiVA^^7 

x—^x 

and extended normal cone 

Nk{x) = {{y,b) eW\ye iVx(a;), ||y|| = l,be R+}. 

For what follows we need the following closeness property of normal 
cone mapping (x^e) — > 

Lemma 4.1 Let = K + e x B, B = {x £ R'^\ ||a:l| < 1}. Then for 
any sequences x — > x G K and e — > 0 ^ 

lim sup Nk\{x) C 

x^Xjt—^O 

Proof. For a; G i?" define y{x) G K such that 

||y(x)-x||= inf ||x-y||. 

y'eh 

Let us show that TK{y{x)) C Tk^{x). Let w G T/^-(y(x)), i.e. 

w — lim where y^ G K, y‘' — )• J/(ar), — > 0. 

Tp 
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Denote + {x — y{x)) G Ke- Then by definition 

w = lim G Tk,{x) 

Ti, 

and thus Tx(y(x)) C Tk\{x). This inclusion implies Nk^{x) C NK{y{x)) 
and Nk\{x) C A^a'(j/(^))- Hence 

lim sup Nk^{x) C limsup C Nk{x).0 

x->x,e—^0 x—^x 

Corollary 4.1 For extended normal cones we have the same closeness 
property^ 

lim sup Nk^{x) C Nk{x), 

>0 

Remark. We could use another sort of relaxation for set K, Suppose 
K is convex and is given by an inequality constraint: 

K = {xeR^\G{x)<Q] 

with some convex function G[x), Consider a relaxed set 
K, = {xeR^\G{x)<e}, 

Normal cones to and K = Kq are formed by subdifferentials dG{x)^ 

X G Ac? of function G, 

jV,rxw! {^«G(x)|A> 0} ifG(x) = e, 

Now closeness property of mapping (x, e) — > Nk^, stated in Lemma 4.1 
follows from closeness of subdifferential mapping x — > dG{x). 

Definition 4.3 Let function F : i?" — ^ R be locally integrable and 
:= F^*} be a sequence of averaged functions generated from F by 
convolution with mollifiers {V’*' := ^ R} where 6k I ^ 

as k — )■ oo. Assume that the mollifiers are such that the averaged 
functions F^ are smooth (of class C^). The set of the extended ij)- 
mollifier subgradients of F at x is by definition 

a*F(x) := climsup, | I|VF‘(x‘)|| j | x‘ ^ x| , 
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where expression is replaced by any unit vector ifVF'^{x'^) = 

0, i.e. d^F{x) consists of the cluster points (in cosmic space R^) of 
all possible sequences ||VF*^(a:*^)||)} such that — )■ x. 

The full (extended) 'i> -mollifier subgradient set is d\sF{x) := \J.^d^F{x) 
where V’ ranges over all possible sequences of mollifiers that generate 
smooth averaged functions. 

The extended mollifier subdifFerential dy,F{x) is always a non-empty 
closed set in i?”. 

Now we can formulate optimality conditions (8) for constrained dis- 
continuous optimization problem: min{F(a;)l x € K}, where F{x) may 
have the form of the expectation. Theorems 4.1, 4.2 and Corollaries 4.2, 
4.3 clarify the structure of the set of points satisfying optimality condi- 
tion (8). 

Theorem 4.1 Let K be a closed set in i?”. Assume that a locally 
integrable function F has a local minimum relative to K at some point 
X £ K and there is a sequence x^ £ K, x^ — > x with F continuous 
at x^ and F{x'^) — >■ F{x). Then, for any sequence of smooth 

mollifiers, one has 

-d,pF{x)nNK{x)j^<li, (8) 

where -dy,F{x) = {(-p,a) £ F”| {g,a) £ d,pF{x)}. 

Proof. Let x be a local minimizer of F on K. For a sufficiently 
small compact neighborhood V of x, define (f) := F{z) + \\z — x|p. The 
function (f> achieves its global minimum on (A'D V) at x. Consider also 
the averaged functions 

/(^)= / cf>{y-z)i^'^iy)dy=F'^{z) + p'^ix,z), 

JR" 

where 

F'^{z)= f F{y- z)xi)^{y)dy, /3’^{x,z)= f \y - z - x\‘^i}^{y)dy. 
JR" J 

In [8] it is shown that (i) functions <j)'^ are continuously differentiable, (ii) 
they epi-converge to <j> relative to KDV and (iii) their global minimums 
z’^ on K C\V converge to x as fc — > oo. For sufficiently large k the 
following necessary optimality condition is satisfied: 

-VF'^iz'^) = n(z^) € NKiz’^), € A. 
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If = 0 for some — > x} then also 0 G d^F{x) and 

0 € Nk{x)- If VF*™( 2 *’") — ^ 7 ^ 0 for some { 2 ^"* — )■ x} then 



VF*m(2fcm) g J.T ( \ 



Md (ilfii. Ilsll) € a*FW. (- 4 . Ilsll) e Nk(x). Iflim supj ||VF‘(z‘)|| 
+00 then for some { 2 *’" — >• x} 



||VF^m(2:^m)|| 



-geNK{x), 



and {g,+oo) e d^j;F{x), {-g,+oo) G Nk{x)^^ 



Corollary 4.2 For a continuous function F condition (8) is necessary ^ 
i,e. (8) is satisfied for all local minimizers of F on K. 



Next proposition shows that optimality conditions are also satis- 
fied for limits X' of some local minimizers of relaxed problems 
min{F(x)| x G Ke — K + cB}. It follows that, although the global 
minimum a: = 0 of problem (6) does not satisfy conditions of The- 
orem 4.1, it falls in the scope of the next statement and thus satisfy 
optimality condition (8). 

Corollary 4.3 Let x^ he a local minimizer such that there exists se~ 
quence x^ — > x^, x^ G Ke, '^Ith F continuous at x^ and F{x^k) — > 
F{xe) as k — > oo. Assume x^^ — > x for some Cm i 0 as m — > oo. 
Then (8) is satisfied at x. 

Proof follows from Theorem 4.1 and closeness of (extended) mollifier 
subdifferential mapping x — > d^F{x) and (extended) normal cone 
mapping (x^e) — > N k\{x) (Corollary 4.1). 



Theorem 4.2 If F is strongly Isc and the constraint set K is com- 
pact then the set X* of points ^ satisfying optimality condition (8), is 
nonempty and contains at least one global minimizer of F in K. 

Proof. Construct a sequence of differentiable averaged functions F^ 
epi-converging to F (what is possible by Theorem 3.2). Relax constraint 
set i.e. define = K + e x where B = {x| ||a^|| <!},€> 0. 
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Find a global minimizer of over K^. For we have necessary 
optimality condition (see Rockafellar and Wets [19]): 

-VF’^{x^,)eNKA4)- (9) 

We can assume that x^ — > G K^. From here it follows 

-5F(y,)niVA-,(ye)^0. 

Now let j/e — > y ^ t — >■ 0- By Theorem 3.1 y is a global minimizer 

of F on K. Then by closeness of mappings dF{-) and Nk^{-) we finally 
obtain 

-5F(y)n]VA-(y)#0, (10) 

i.e. y e X*.a 

The proof of Theorem 4.2 also clarifies the structure of the set X*: 
(9) is satisfied for local minimums of F*^ on A’e, hence (10) is satisfied 
for their limit points. 

Now let us come back to problem (7) and show how the developed 
theory resolves the exposed difficulties. 

Example 4.1 Consider again optimization problem (7). Then we have 

d,p-^\x=o = (+1) +oo)> X x>o(0) = 

and thus 

—d-^-^\x=Q n A^a;>o(0) = ( — 1, +Oo) ^ 0. 

5 On numerical optimization procedures 

Theorems 3. 1,4.2 immediately give at least the following idea for the 
approximate solution of problem (1), (2). Let us fix a small smoothing 
parameter B and a small constraint relaxation parameter e, choose a 
mollifier i’ei') = ^^d instead of original discontinuous optimiza- 

tion problem consider a relaxed smoothed optimization problem: 

min[F^(a:)| x G A’t], (11) 

where F^(x) is defined by (3). Then stochastic gradient method to solve 
(11) has the form: 

x° is an arbitrary starting point; 



x>^+^ =UkA^'^ - Pk^eix'^)), fc = 0,l,...; 



(12) 
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where denotes the orthogonal projection 

operator on the set positive step multipliers pk satisfy conditions 

oo oo 

Y,Pk = +oo, ^pI<+co, (13) 

k=0 k=0 

Vectors can be called stochastic mollifiers gradients. 

The convergence of such kind stochastic gradient method to a sta- 
tionary set 

Xf = {x€K,| -VF^(x)€iVi^,(x)} 

(containing local and global minimums of F^{x) on Xf) follows from 
results of [5]. Now coming to the limit in 9 — y 0 and then in e — y 0 
we see that limit points [limsup^(limsup^Xf)] satisfy optimality con- 
dition (8). 

6 Conclusions 

In the paper we formulated optimality conditions (8) based on extended 
mollifier subdifferentials ^^l;F{x) for a constrained {x G K) optimization 
problems with strongly lower semicontinuous objective function F{x), 
In unconstrained cases these conditions are reduced to a familiar form 
0 G d^pF{x) (Theorem 3.3) and are necessary, i.e., they are satisfied 
for all local minimums of the problem. Optimality conditions (8) are 
necessary also in the case of continuous objective function F{x), In 
discontinuous cases the situation is more complicated: we cannot guar- 
antee that all local minimizers and even all global minimizers satisfy 
(8). The reason for this is that not all local and global minimizers of 
F on K are achievable through the minimization of averaged functions. 
Nevertheless the set X* of points satisfying (8) is nonempty, contains at 
least one global minimizer and its structure is clarified by Theorems 4.1, 
4.2 and Corollaries 4.2, 4.3. Optimality condition (8) is constructive: 
it leads to numerical procedure (12) to find elements of X*. Limits (as 
9 — y 0 and e — y 0) of local and global minimizers of problems (11) 
belong to X*. 
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On the Equivalence in Stochastic 
Programming with Probability and 
Quantile Objectives 
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4, Volokolamskoe Shosse, Moscow, 125871 RUSSIA 



Abstract. The equivalence between the quantile functional minimization 
and the probability functional maximization under the assumption that the 
probability measure may depend on the optimized strategy is discussed. The 
equivalence ensures an opportunity to obtain a solution of each of these prob- 
lems by solving another one. The weakened sufficient conditions for the equiv- 
alence are presented. These conditions are more general and can be verified 
easier than the known ones. They are applied to prove the optimality of a 
satellite orbital correction with respect to a quantile performance index. 



1 Introduction 

The subject of this article is an interconnection between solutions for two 
stochastic programming models. The first one contains an objective which is 
a probability functional. The second model contains the quantile functional. 
Such models have a lot of applications in Engineering and Economics. The 
probability functional can be used when we intend either to increase the re- 
liability of a technical system under some technical constraints or to reduce 
the risk of obtaining the undesirable outcomes when making a financial deci- 
sion. As a rule, the probability functional has to be maximized with respect 
to a strategy. The quantile objective is an inverse function to the probability 
functional. It appears when we should optimize another performance (e.g. 
the cost) of either a system or a decision under the given level of either the 
reliability or the risk. We set 

reliability = 1 — risk. 

In this relation the quantile objectives were first introduced in [1] where they 
were called confidence limits. We assume that the quantile objective has to 
be minimized. We see that these optimization problems are similar to one 
another, so an interconnection between their solutions should exist. This 
interconnection is called an equivalence. The equivalence allows us to get a 
solution for each problem by solving another one. Its rigorous definition will 
be presented in sec. 2. 
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The motivation of this research is caused by the following three reasons. 

Firstly, the state of the art in this field is such that there are methods [2] to 
solve each optimization problem in question. These methods have different 
conditions of applicability. For this reason we can be able to solve one of the 
problems meanwhile another problem can be hard. In such a situation the 
above-mentioned interconnection can turn out to be very useful. 

Secondly, there are sufficient conditions [2] for the equivalence between the 
problems in question. These conditions will be discussed in sec. 2, where we 
shall note that they can be very restrictive and hard to be verified. 

Thirdly, the known results on the subject concern of problems where the 
probability measure does not depend on the optimized strategy. This condi- 
tion is not typical for the infinite- dimensional models which arise in Multi- 
Stage Stochastic Programming and Stochastic Optimal Control. 

Taking into account these reasons we formulate our goal as generalization 
of the known equivalence conditions in order to overcome the obstacles men- 
tioned above. The basic results will be given in sec. 3 where we shall prove 
two theorems. We shall also obtain two-sided bounds for the optimal values 
of the objectives in question. 

In sec. 4 we shall apply a presented theorem to verify the optimality of a 
satellite orbital correction. We shall deal there with a model containing a 
controlled probability measure. 



2 Problem statement and some remarks 

Let (f2, B) be a measurable space, 17 be a set of strategies u, be a family 
of probability measures defined on B, 0{u,uj) and Q{u^uj) be B-measurable 
functionals defined on U x B. The probability functional is defined as follows: 

p^(u) 73„{w : ^(u,w) < ip, Q{u,u) < 0}, (1) 

where (p is a scalar parameter. If u is fixed then the probability functional cts 
function of is a distribution function for the improper random variable 

p (^\ if < 0 

^ 1+00, if Q(u,o;) > 0. 

Therefore Pip{u) is a right- continuous non- decreasing function of <p. 

Let us define the quantile functional in the following way 

<^a{u) min{</? : P^{u) > a}, (2) 

where a G (0,1) is a given probability. Since is improper, we set 

by definition ^a(u) = +oo if there is no (p satisfying the probabilistic con- 
straint P(p{u) > 0. We surely deal with such a situation when a > P* 

sup : Q{u,u) < 0} and can meet it when a = P*. 

ueu 
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We should note that if we fix u and investigate the behaviour of Pip{u) 
and ^oi{u) with respect to parameters ^ and a there are no new mathemat- 
ical problems caused by the controlled measure, i.e. by the fact that the 
probability measure depends on the strategy. The detailed survey can be 
found in [2], where the basic properties of the functions and P(p{u) are 

also described for the constant measure Vu = V and U C PP' . In particu- 
lar, it is known that the quantile functional considered as function of a is 
left- continuous and non- decreasing. 

The subject of this paper is the interconnection between the solutions for 
two optimization problems. The first problem is the probability functional 
maximization 

maximize Pip{u) subject to u G ?7. (3) 

The second one is the quantile functional minimization 

minimize subject to u EU. (4) 

Most of the known methods [2] for solving these problems are hardly ap- 
plicable here due to the controlled measure, since the basic properties of the 
functionals in question are not clear in this case. Moreover, the problem of 
establishing these properties seems to be hard, since we need a tool in order 
to describe the controlled mea^sure Vu- 
There is an idea [3] for transforming problems (3) and (4) into one another. 
It is easy to see that the functionals (1) and (2) considered as functions of 
parameters ip and a are inverse to one another in the generalized sense. We 
emphasize that they are inverse if they are strictly increasing in their domains. 
Let us denote by Uip and the sets of optimal strategies u^p and for 

problems (3) and (4), respectively, M* (0,P*), N* be the range of 

^(u, x) for Q{u, x) < 0, u £ U and x The following two definitions and 
a theorem (Theorem 4.4) are taken from [2]. 

Definition 1. Let a G M*, E ^ 0 and (p If = Uip 

then problem (4) is equivalent to (3) with the parameter (p. 

Definition 2. Let (p E N* , U(p E Utp ^ 0 and a P(p{utp). If = Utp 
then problem (3) is equivalent to (4) with the parameter a. 

Theorem 1. Let Vu do not depend on u and the following conditions hold: 

(i) for every a E Ac M* the set is non-empty, E B C N*; 

(ii) for every (p E Bi C B C N* the set Uip is non-empty, and P^p{u^p) E A] 

(iii) int(A) 0 and int(Si) / 0; 
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(iv) V{uj : = (p, Q{u^u) < 0} == 0 and 

V{lo : \<>{u,uj) - (p\ < £, Q{u,u)) < 0} > 0 for all e > 0, u e U and 
<peB, 

Then 

= [J {u^p : Pip{u^p) = a} for all a G ^ 

(peBi 

and 

U^p = \^ {u^ : = V^} for all (f e B. 

a£^A 

We see from this theorem that if its conditions hold we can obtain a solution 
of each problem in question by solving another one. For example, suppose 
we deal with problem (4) under the conditions of Theorem 1 and a solution 
Ucp for (3) can easily be obtained for every p. Then solving the equation 
P^p^Up) = a with respect to (p we can assert that the corresponding strategy 
Uip is optimal for (4). 

Note that the conditions of Theorem 1 are very restrictive, since they must 
hold, in particular, for every u £ U. This fact often leads to hard problems 
when the strategy u belongs to some functional space. Moreover, this theorem 
requires that the probability measure does not depend on u. 

3 Equivalence conditions 

We first obtain two-sided bounds on the optimal values of the objectives 
Pip{u) and ^a(u). The bounds are stated on the basis of the monotonicity of 
the functions Pp{u), 0a{'^) and their optimal values 

F((p) sup Pip{u), G{a) inf ^a(u). (5) 

u£U 

Lemma 1. Let a and b be real numbers such that b > a and F{b) > F(a). 
Then a < G{a) < b for every a G {F{a), F{b)). 

Proof. Let a G (F(a), F(6)). Suppose that G{a) < a. Then according to 
the definition of the infimum function there exists a strategy u, such that 

h ^a(u) < a. From (2) it follows that Ph{u) > a, hence F{h) > a > F{a). 
But this inequality is impossible, since the function F( ) is not decreasing. 
From the obtained contradiction we conclude that G{a) > a. Further on, from 
F{b) > a it follows according to the definition of the supremum function that 
there exists a strategy u for which Pb{u) > a. By (2) we obtain < b. 

It follows that G{a) < b. 

The lemma is proved. 

Lemma 2. Let y and /3 be some probabilities such that J > /3 and G{y) > 
G{/3). Then /3 < F{(p) < y for every p G (G(/?), G(t)). 
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Proof. Let (p > G(/?). Then there exists u such that p > ^/?(n). Hence 
Pip{u) > /3. It follows that F{p) > f3. 

Assume that F{p) > 7 for some p E (G(/?), ^(7)). Then there exists u 
such that P(p{u) > 7. It follows that ^7(1/) < p^ hence G(7) < p. The last 
inequality contradicts the above condition p E (G(/?), G(7)). Thus F{p) < 7. 
The lemma is proved. 

Following to the equivalence idea we need to deal with the equations F{p) = 
Oi and G{a) — p. We know that the functions F{p) and G{a) are always 
non-decreasing meanwhile the problem of their continuity seems to be hard. 
We emphasize that the monotonicity and continuity of these functions are 
necessary to use the conventional notion of the equation root. To overcome 
the continuity problem we slightly modify this notion. 

Definition 3. Let f{x) be a non- decreasing function of a scalar argument. 
A point xo such that 



f{xo - ^) < 0 < f{xo + e) 

for every £: > 0 is called a root of the equation f{x) = 0. This root is single if 

f{xo -e) <0 < f{xo +e). 

Theorem 2. Lei pa be a single root of the equation F{p) = a. Then G{a) = 
Pa- Moreover, if for p = pa there exists a solution of problem (3) and the 
inequality F{pa) > Oi holds then u^p is also a solution of problem (4). 

Proof. The first assertion of the theorem is an easy consequence from 
Lemma 1. In fact, for every £: > 0 we have 

F{pa — e) < a < F{pa + ^:)- 

Hence by Lemma 1 we can write 



Pcx - e < G{a) < pa-\-e. 



Since e is an arbitrary positive number, we conclude that G(a) = pa- 
Let us prove the second assertion. Let be a solution of (3) for p — Pa- 
According to the definition of the infimum function we have 



Pa ^ (b) 

On the other hand, due to the theorem condition F{pa) > a it follows that 
Pp^{up) — F{pa) > Oi. Hence from (2) we deduce 

Pa ^ (7) 

Inequalities (6) and (7) are compatible iff pa — This means that 

is optimal for (4). 
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The theorem is proved. 

Theorem 2 gives us a tool to get a solution of quantile minimization problem 
(4) by solving problem (3). We see that we need to verify that the root (fa 
is single and F{fa) > The first condition can be verified, for example, by 
constructing the plot of F{f) in a neighbourhood of fa- The second one can 
be verified by the Monte Carlo simulation. 

The next proposition allows us to carry out an inverse transformation, i.e. 
to reduce problem (3) to (4). 

Theorem 3. Let a^p be a single root of the equation G{a) = f. Then F{f) = 
Qf(^. Moreover, if for a = a<^ there exists a solution u" of problem (4) and 
G{a(p) < f then is also a solution o/(3). 

Proof. The first assertion follows directly from Lemma 2. In fact, since 
is a single root, for every £: > 0 we have 

G{oi(p — s) K f K G(o'(^ “h £). 

Applying Lemma 2 we obtain 

a^ — e< F{f) < + 5. 



Therefore F{f) = ot^p. 

Let us prove the second assertion. Since F(^f) is a supremum function, the 
following inequality 



( 8 ) 

holds. On the other hand, from the theorem condition G{a,p) < it follows 
that 0a^{u^) = G(o:(p) < f- Hence 

> P^(u^) > (9) 

Inequalities (8) and (9) are satisfied simultaneously iff = P,p{u^) which 
means that is optimal for (3). 

The theorem is proved. 

Remark. In Theorems 2 and 3 it is supposed that there exist solutions 
u^p and of problems (3) and (4). Both theorems take into account cases 

where several such solutions exist, since it is easy to see that the proofs of 
these theorems are valid for all strategies and which are optimal for 
problems (3) and (4), respectively. 

Example. Let ^(a;) be a random variable distributed uniformly over [0,1], 
Q{u,(jj) = 0, U = IR^ and 






|u|, u^O 

0, ^(o;) G [0,p] H u = 0 

1) G (p, 1] H u = 0. 
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It is easy to verify that in this case a solution Uip of problem (3) exists and 
can be obtained analytically for every (p. The supremum function F(p) takes 
the following form 

rO, (p<0 
F{(p) = < P, P = 0 

ll, ^>0. 

Thus the equation F{(p) = a has the single root = 0 for every a G (0, 1). 
According to Theorem 2 we can guarantee that u^p\^p=o (it turns out to be 
equal to zero) is optimal for (4) ifa < p. For a > p the condition F{pa) > ot 
of Theorem 2 does not hold. In this case it can be shown that for such a 
there are no optimal solutions for every p. Thus the condition F{pa) > a is 
quite essential to allow us to utilize the equivalence idea. 



4 A satellite orbital correction 



In this section we illustrate numerical techniques for the application of the 
above theoretical results by an example where we deal with optimization of 
the last orbital correction for a satellite which is desired to be geosynchronous. 
The following mathematical model for the correction is taken from [3]. Let 
0 ? be a terminal drift speed which is an observed longitudinal bias per a 
satellite revolution after the correction, he an initial drift speed before the 
correction and u = u(^o) be a correcting impulse which is implemented with 
an error u • . Then 

d = ^o-\-u> (1+6)- 

Suppose that = ^o(^) and be independent random variables 

with normal distributions such that £*[^ 0 ] == var[^o] — cto? — 0 and 
var[^i] = cTj. Suppose also that every Borel-measurable function u(-) be a 
feasible correcting strategy. The probability measure Vu is induced on the 
Borel algebra B(IR^) by a distribution of d. Set ^(u(-), cj) = |d|, Q(u(-), c<;) = 0 
and consider problem (4) with the given reliability level a G (0, 1). 

In [3] the following relations defining a strategy u^p which is optimal for (3) 
for every p > ^ have been obtained: 



/-f 'l _ / “2^o/(2 + a + 6), 

~ i 0 



X^> I _ 161 
< 1) <P 



b^-a^ 

X(p 1 

It is easy to verify that 



2al In 



x^p + l 

X (p 1 



2 Q, b y> 0 . 



u^{^o) = I + ^Jl + 2alx^ln (f^)) 

lo, 



x^ > 1 
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Figure 1: The supremum function F{(p) 

There is a question: can we assert that this feedback control with the inert 
zone is optimal for (4) if we choose ^ as a root of the equation F{ip) = 
a, i.e. can we apply here the equivalence idea? Following to the above 
recommendations on the application of Theorem 2 we have constructed the 
plot of F{i^) — P^p{Uip) (see figure) by using a dense net of points with the 
distance 0.001 between each two neighbouring points. The density of the net 
allows us to disregard errors of plotting. The calculation of F{<ip) has been 
carried out by the Monte Carlo method with 150 000 realizations for m = 1®, 
(Tq = 0.3®, and (T\ = 0.05®. We note also that the function F{^p) is similar 
to the usual distribution function. It is known that by application of the 
Monte Carlo simulation we usually obtain a jumping curve which estimates 
the distribution function if the number of realizations is small. Increasing 
the number of realizations we decrease the jumps. In our opinion, the quite 
smooth behaviour of F{(p) allows us to disregard calculation errors. Thus we 
conclude that the supremum function F(^) is most probably continuous and 
strictly increasing, so we can believe that all the conditions of Theorem 2 
likely hold. Thus we can surely utilize the equivalence idea and assert that 
the strategy u^p is optimal for (4) under an appropriate value (p = (fa which 
is defined as a root of the equation F{(p) = a where a G (0.77,0.99). The 
limits for a can be widened by the more careful calculation and plotting. 
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5 Conclusion 

The weakened sufficient conditions for transforming problems (3) and (4) 
into one another have been presented above by Theorems 2 and 3. They are 
verified as a rule after a strategy which is optimal for an alternative problem 
and seems to be optimal for the problem in question has been obtained. Their 
practical verification can be performed cts shown in sec. 4. 
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A Note on Multifunctions 
in Stochastic Programming 



Vlasta Kankova^ 



Abstract. Two-stage stochastic programming problems and chance con- 
strained stochastic programming problems belong to deterministic optimiza- 
tion problems depending on a probability measure. Surely, the probability 
measure can be considered as a parameter of such problems and, moreover, 
it is reasonable to investigate stability with respect to it. However, to in- 
vestigate the stability of the above mentioned problems it means, mostly, 
to investigate simultaneously behaviour of the objective functions and mul- 
tifunctions corresponding to the constraints sets. The aim of the paper is 
to investigate stability of the multifunctions and summarize by it (from the 
constraints point of view) problems with the stable behaviour. 

Key words: Stochastic programming problems, multifunctions, Hausdorff 
distance, Kolmogorov metric. 

1 Introduction 

It is generally known that to take a decision in practice, very often it is suit- 
able (or even necessary) to deal with an optimization problem in the form: 

Find 

inf{(/„(a;)la: € AC(t/)} = (1) 

where gu{x) is a real- valued function defined on n > I and /C(i/) is a 
multifunction mapping a parametric space into the space of the subsets of 
Erii is a parameter of the problem. {En^ n> 1 denotes an n-dimensional 
Euclidean space.) 

If the “value” of the parameter 1 / is known and, moreover, the solution 
can be find out with respect to its actual “value”, then (1) is a problem of the 
deterministic optimization. However, it happens very often that the “value” 
of u has to be replaced by some approximate one. Consequently, it is reason- 
able (or even necessary) to investigate the stability of (1) with respect to a 
perturbation of the parameter i/. Evidently, the behaviour of (p{u) depends, 
essentially, on the behaviour of the multifunction IC{u). 

Let (Q, 5, P) be a probability space; • • •) 

and = ^^(^) = Ui(^)j S 2 -dimensional random 

vectors defined on (fi, S', P); z^ e Eg,, G Eg^ be 

^Institute of Information Theory and Automation, Academy of Sciences of the Czech 
Republic 
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the distribution function of and the conditional distribution function 

respectively conditioned by ^^(u;)). 

Let, moreover, z^), z^), i = 1, 2, ..., /2 and z^) 

be (real-valued) functions defined on Eny^ri 2 x ^ 5 i+s 2 » ^ui+ns ^ 5 i 

Er., X = [x^ X^], ~e = ^ ^-2 ^ ^2J. ^ 



C Eu 2 be nonempty sets; C Es^, Z^ = 



C E, 



denote the supports of the probability measures corresponding to the distri- 
bution functions 



A rather general two-stage stochastic nonlinear programming problem can 
be introduced in the following form: 

I. Find 

^‘(w))|a;i e (= (2) 

where for E £’m, ^ £* 3 i) 

Q(x\ z^) = 5 f\a;S z^) H-inf {/^(x^ ^^(w))|x^ € K^{x\ z^)}, (3) 

IC^ix\ z*) = {x^ e : g^{x^, z*) < 0, f = 1, 2, . . . , h}, (4) 

Ep^i , £^^^ 21^1 denote the operators of mathematical expectation correspond- 
ing to F^^l^^(z^lz^). 

Evidently, (for given and xi) i/ := can be considered as a parame- 

ter of the inner problem: 

Find 

inf£'p.t2|<i=.i{/^(x2, ^^(w))|x^ €lC^ix\ z^)}. (5) 

The corresponding multifunction IC{u) and the function (f{u) fulfil the rela- 
tions 



IC{i/) := IC^{x^ , z^), (p{i/) := Q{x^ , z^) by fixed x^ E (6) 

Moreover, F^^( ) can be consider as a parameter of the outer problem (2). 

Stochastic programming problems with the individual and joint probabil- 
ity constraints belong to the optimization problems in which the probability 
measure occurs in the constraints. If 5 f/(x^, z^), i = 1, 2, . . . , l\ are real- 
valued functions defined on Em x then the stochastic programming 
problem with the individual probability constraints can be introduced as the 
problem: 

II. Find 



inf£'p<i{if‘(x*,^^(w))|x^ G Xp.«i(a)} {= <p{F^\ a)), (7) 

h 



* = 1 



( 8 ) 
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< 0 } > «>•}> i= I, ■ ■ ■,h, 

where a = («i, . . . , a/J, o:* G (0, 1), i = 1, . . . , /i is a (well-known, fixed) 
parameter of the problem. (’) denotes the probability measure corre- 
sponding to the distribution function 

Evidently in this case we can consider u := (•) and, consequently, 

IC{l^):=X^,^{a), (9) 

It is known from the literature that the stochastic programming problem 

with the joint probability constraints can be introduced as the problem: 

HI. Find 

infE^«i{(ir‘(a:',4'(a;))|a;‘ G X,^«i(a)} {- <p{F^\ a)), (10) 

(a) = {x^ G : Pp,i{u; : gjix\ ^i^)) < 0, i = 1, 2, . . . , /i} > a}, 

( 11 ) 

where o; G (0, 1) is a (well-known, fixed) parameter of the problem. 

We can consider 1 / := F^\-) and, consequently, 

ICiu):^Xp,^ia). (12) 

The stability of the multifunctions (corresponding to the stochastic pro- 
gramming problems) has been already investigated in the literature, see e. 
g. [4], [13], [18], where the continuity of X^^i(a), Xp^i{a) with respect to 
the Kuratowski convergence was investigated. The continuity and the Lips- 
chitz property of the multifunctions (in case of the multistage programming 
problems) with respect to the Hausdorff distance has been already investi- 
gated in [8], [9]. Moreover, the dependence on the continuity (and generally 
stability and estimates) of the optimal value and the optimal solution set on 
the corresponding multifunction behaviour appears in [2], [17]. 

In this paper, we focus our investigation mostly on the Lipschitz property 
(with respect to the Hausdorff distance) of /C^(x\ X^i^ (a), Xp^i (a). In 

particular, first, we shall deal with the Lipschitz property of the multifunction 
/C^(x^, z^), (We recall that deterministic parametric optimization problems 
were investigated in the literature many times, see e.g. [1].) Furthermore, we 
employ some of these assertions to investigate the problems with probability 
constraints. 



2 Assumptions and Auxiliary Assertions 

In this section we shall introduce some assumptions and auxiliary assertions. 
To this end, let first fi{x)j i = 1, 2, . . . , / be real- valued functions defined 
on En (n, I > 1) and IC*{y) be a multifunction defined by the relation 

IC*{y) = {x e X : (x) < yi, i = I, . . . , I, y = {yi, . . . , yi)}, yeY, (13) 
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where X C En^Y C Ei are nonempty sets. 

Furthermore, let Yq denote the convex hull of Y; Vo(^), ^ > 0 the e- 
neighbourhood of Vo. We shall introduce the systems of the assumptions. 

1.1 a. fi(x)j i = 1, 2, . . . , / are linear functions, X = En \ 

without loss of generality, we can consider in this case the con- 
straints in (13) to be in the form of equalities, 

b. for every y G Yo{e), /C*(t/) is a nonempty, compact set, 

c. the matrix A of the type (/ x n), I <n fulfils the relation 

V{y) = {xeX: Ax = y),yeY 

and, moreover, all its submatrices of the types (/ x /), . . . , 

A{m) are nonsingular, 

d. (7 = /maxi^r,5 |<^ir(5)|, where for s G {1, 2, ..., m} der 

note elements of the inverse matrix to t1(s), 

1.2 There exist real-valued constants di, 7i, £: > 0 such that: 

a. If for X £ X,y £ Vo(e), y = (yi, •••, yi), //(x) < y,, i = 
1, ..., / and simultaneously fj{x) = yj for at least one j G 
{1, ..., /}, then there exists a vector x(0) G En (generally de- 
pending on x), ||a?(0)|| = 1 such that 

X d- dx{0) e X, fi{x)-f*{x-\'dx{0))>jid 

for every d G (0, di), f == 1, 2, . . . , /, 

b. for every y G ^0(^)5 ^*(y) is a nonempty, compact set, 

c. 

(II • II = II • ||nj ^ > 1 denotes the Euclidean norm in £■„.) 

1.3 a. is a convex, compact set, M\ = sup ||2^(1) — ^(2)||, 

x{l),x{2)eX 

b. /,*(x), z = 1, . . . , / are convex functions on X, 

c. for every y G Yq(£:), /C*(y) is a nonempty set, 

d. (7 = ^ for an Sq E (0, e). 

Lemma 1. Let the relation (13) be satisfied. If at least one of the systems 
of the assumptions i.l, i.2, i.3 is fulfilled, then 

A[X:*(y(l)), AC*(y(2))] < C||y(l) - y(2)|| for every y(l), y{2) £ Y. 

( A[*, •] = An[-, -]y n > 1 denotes the Hausdorff distance of the subset En] 
for definition see e.g. [12].) 

Proof. If the system of the assumptions i.l or i.2 are fulfilled, then the asser- 
tions of Lemma 1 follows from the well-known results of linear programming 
and convex analysis [16]. The assertion of Lemma 1 under the assumption 

i.3 is proven in [8] (see also [9]). 
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3 Main Results 

3.1 Two— Stage Stochastic Programming Problem 

Investigating the multifunctions corresponding to the two-stage stochastic 
programming problems, first, we shall restrict to the case when 

z^) = fi(x^) - h‘^{x\ z*), i=l, e x\ x^ € z* G Z\ 

where fi{x^)i * = 1, • • . , h are real-valued functions defined on 

Eri 2 and Em x E^^. Consequently, in this case 

/C2(xl, z^) = {x^ € : f^(x^) < h^{x\ z^), i=l, I 2 }, 

x^eEn„z^eE„. (14) 

To investigate in this special case, we substitute (into (13)) 

/ := h, n := ri 2 , X := //(x) := /^(x^), i = 1, 2, . . . , h, (15) 

Y := {y € Ei^, y = {yi, ■ ■ ■, J//J : there exist x’ e X\ 
z* G such that t/j = h?(x*, z*), f = 1, . . . , / 2 ), 

IC*{y) := {x^€X^: f?{x^)<yi, i=l,...,l 2 ,y = {yi,...,y,,)), yeE,,. 

Evidently, in this case we can obtain 

^*(y) = z^), (16) 

y = {y\, ■■■, y/J, y< = h^(x^, = i, 2, /2, g e„„ z‘ g £/.. 

Proposition 1. Let the relations (14), (15) be satisfied. If 

1. at least one of the systems of the assumptions i.l, i.2, i.3 is fulfilled, 

2. for every x^ G /i^(x\ z^), i = 1, /2 are Lipschitz functions 

on with the Lipschitz constants Lj, 

then 

A[/C^(xS z^l)), IC^ix\ z^(2))] < C||zi(l) - z^(2)|| L,- 

1 = 1 

for every -2:^(1), 2 ^^ (2) G , 

Proof. The assertion of Proposition 1 follows immediately from the substi- 
tution (15), the relation (16), the assertion of Lemma 1 and the assumptions. 

We have investigated the case when the variables corresponding to the 
first and to the second stage can be separated. Furthermore, we shall con- 
sider more general case. To this end we introduce the new systems of the 
assumptions. 
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1.4 a. are convex sets, moreover compact, 

Ml = sup ||x2(l) - x2(2)||, 

ar2(l),ar2(2)eA'2 

b. for some 6: > 0 and every G ^ ^ = 1, 2, . . ., h 

are convex functions on x Z^{e)^ 

c. for £ X^^ G Z^{e), is a nonempty, compact 

set, 

d. C = ^ for an £o € (0, e), 

1.5 there exist real-valued constants D > 0, D > 0, £ > 0, di > 0, d\ > 

such that 

a. if x^ G ^ ^ 

||z^(l) — ^^(1)11 < Sy x^(l) = [x^, x^] fulfil the inequalities 

9i{x^{l), ^^1)) < 0 foj* every z € {1, . . . , /2}, 

^^(1)) > 0 for at least one i G {1, . . . , h}, 

then for i G {1, . . . , h}, 

gf{x\l)^ -z\l)) - gUx\l)^ z\l)) < D\\z\l) - .-^1)11, 

and simultaneously there exists x^(0) == x^(0, ^^(1), 

||x^(0)|| = 1 such that x^(2) = [x^, ir^(2)], x^(2) = x dx^(O) 

G X^ for every d G (0, di) and, moreover, for i G {1, . . . , / 2 } 

gfix^il), z-i(l)) - gf{x\2), zHl)) > D\\Ai) - AM, 

b. for every x^ £ X^ ^ z^ £ Zq{s)^ /C^(x\ z^) is a nonempty, com- 
pact set, 

c. C=§. 

{Zq denotes the convex hull of Zq{£)^ e > 0 the ^-neighbourhood 

of Zl) 

Proposition 2. If the system of the assumptions i.4 or i.5 is fulfilled, then 
A[/C^(a;\ z‘(l)), lC\x\ z‘(2))] < “ ^^(2)|| 

for every x^ e X\ zi(l), z^{2) € 

Proof, If the assumptions i.4 are fulfilled, then the assertion of Proposition 
2 is only a rather generalized assertion of [3] (Lemma 1). Since the proof of 
the original assertion and the genralized one are very similar each other it 
remains only to prove the assertion of Proposition 2 under the assumptions 
i.5. To this end, let x^ G G ^^(1), >2:^(2) G Z^ be arbitrary 
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such that x^(l) G 1C{x^ , ^^1))- According to the fact that Zl{e) is a convex 
set there exist points f \ . . . , G ^o(^) that 

= ^^2)) 11^*^ — £•^■‘■^11 < e, 

= Ajz(l) + (1 “ for some Xj G (0, 1), j = I, 2, . . . , k. 

Two cases can happen: 

a. gf{x^{\), f2) < 0 for every z G {1, 2, . . . , / 2 }, x^(l) := [x\ ^^(l)], 

b. P) > 0 for at least one j G {1, 2, . . . , / 2 }. 

Evidently, if a. happens, then a^^(l) G P). If the case b. happens, 

then according to i.5 a there exists x^(0) == x^{0, x^(l), P) such that 

\\x\0)\\ = 1, x^2) = x^{l) + dx\0) G for d G (0, di) 

and, moreover, the following inequalities are fulfilled for P(2) = [x^, x^(2)] 

gfiPil), z^) ^ y?(x2(l), .1(1)) < D||.i(l) - .-2||, 

gUP{l). z^) - y?(x2(2), P) > D\\x^{l) - x2(2)||, z = 1, 2, . . . , h. 

However, we can successively obtain from these inequalities that 

gUP{2l P) < gU^\l). P) - ^||x2(2) - x2(l)l| < 

zH^)) d- ^lk'(l) - ^')ll - D|k2(l) - x^{2)\l 

i — 1 , . . . , /2 • 

Consequently, if ||a?^(2) — a?^(l)|| = ^|| 2 :i(l) — P\\, then x^(2) G IC'^{x^^P). 
Since x^(l) G /C^(xi, z^) was an arbitrary point we can see that 

A[IC\x\ z^l)), IC\x\P)]<C\\z\l)-P\l 

Replacing successively, ^:i(l) := F, P := jf = 2, . . . , A: — 1 and em- 
ploying the fact that the Hausdorff distance is a metric in the space of the 
closed subsets of En 2 (see e.g. [12]) we obtain the assertion of Proposition 
2. 



Proposition 1 and Proposition 2 introduce the assumptions under which 
for every G /C^(xi, z^) is a Lipschitz multifunction on (with 
respect to the Hausdorff distance). Employing, furthermore the results of [5], 
[6] we can see that this property together with some additional assumptions 
on the probability measure and the objective function garantee the validity 
of the inequality 

\(p^{F^^) - (p^{G)\ < C'o(sup \F{z^) - G(zi)l)^, Co > 0 a constant 
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for a (“near” to {•)) -dimensional distribution function G( ). Further- 
more, if we interchange the position of and in Propositions 1 and 2, 
then according to [7] we can employ these results (together with some addi- 
tional assumptions on the objective functions) to garantee the “best” possible 
convergence rate of empirical estimates of optimal value (for independent and 
some weak dependent random samples) and “nearly best” rate of convergence 
for some others types of weak dependent random samples. 

3,2 Stochastic Programming Problems with Individual 
Probability Constraints 

In the case II we shall restrict to the case when si = l\ 

z^) = //(x*)-z/, i = 1, 2, /i, z^ = (zj, z^, z/j, (18) 

where //(x^), z = 1, . . . , /i are functions defined on Evidently, then 

h 

= (19) 

1 = 1 

= W G > “*}. i=l, 

where F/ (•2^/)) G E\, i = I, . . . , denote one-dimensional marginal 

distribution functions and the corresponding probability measure. If we de- 
fine the new multifunctions ^i{z})^ z == 1, . . . , /i and the quantils (oi), G 

(0, 1), 1 = 1 ,..., h, by 

iCHzj) = G : //(xi) < z/}, z/ G Fi, i = 1, . . ., /i, 

= sup{z/ : > ««}) i = 1, ..., /i, 

then 

= (20) 
Furthermore, if ^(^^), = (^h • • •) z}^) is defined by the relation 

X\z^) = QfCj{zl), 

then also 

Xp(i (d) = ic^{kpii (d)), where k^^i (d) = (ai), . . . , k^^o (a; J). 

(21) 

If G(z^), G Ei^ is an arbitrary /i -dimensional distribution function, 
then to estimate A[X^i (a) Xg(«)] let for > 0, z = 1, . . . , /i, e > 0 

= Ffizl + 6i), (22) 

h 

y = J]yi, Yi = ik^,.{ai)-6i-e,k^,^{ai) + 6i + e). (23) 

2 = 1 
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If we substitute into (13) 

n:=nul:=h,Y :=Y,X := X\ f:{x) := i=l,...,h, (24) 

then the following assertion holds. 



Proposition 3. Let 6,- > 0, i = 1, /i, £ > 0 be arbitrary. Let, moreover, 

relation (24) is fulfilled. If 

1. at least one of the systems of the cissumptions i.l, i.2, i.3 is fulfilled, 

2. for z = 1, . . . , /i, F/ (zj) is an increasing function on Yi, 

3. G(z^) is an arbitrary /i -dimensional distribution function such that 



Giiz}) € Fi,sX4)) for zj eYi, i = I, 2, . . . , k, 



then I 

A[Xp,^(a), ;?G(a)] < - fcG(a)|| < (25) 

i=l 

(Gj(z/), i = 1, 2, . . ., /i denotes the one-dimensional marginal distribution 
functions corresponding to G{z^).) 



Proof. If we substitute the relation (24) into (13) then the assertion of 
Proposition 3 follows immediately from the assertion of Lemma 1 , the rela- 
tions (20), (21). 

In the case when one-dimensional marginal distribution functions are ab- 
solutely continuous with respect to one-dimensional Lebesgue measure, then 



Corollary. Let £: > 0, 6,- > 0, i = 1, 2, . . ., /i. Let, moreover, the relation 
(24) be satisfied. If 



1. at least one of the systems of the assumptions i.l, i.2, i.3, is fulfilled, 

2. for i = 1, ..., /i, P^^i( ) are absolutely continuous (with respect to 

the Lebesgue measure in E\) probability measures such that the cor- 
responding probability density hi{z}) fulfil the inequality 



hi{z\) > di for every z\ £ Yi and some z?,- > 0, 



then 

,, 2sup\Ffizl)-Gi{zl)\ 

A[X^,.(d), Xoia)] <CJ 2 — , 

whenever G(-) is an /i -dimensional distribution function with one dimension- 
al marginals G,( ) such that 

sup|i?;f‘(zi)-G,(zi)| 
y* 



4. c^;,.( 
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(e), i = 1, . . . , /i, £ > 0 denote the support and ^-neighbourhood 

* >1 
(of the . 1 ) corresponding to distribution functions Gj( ) and . 

Proof. The assertion of Corollary follows from the assertion of Proposition 
3. Namely, if we set 

2 sup |F/ (zi) - G,( 2 ,)| 



then the assumptions of Proposition 3 are fulfilled. 

Employing, the results of [10] we can see that (under some special assump- 
tions on the objective functions and on the probability mecisure) we can 
obtain 



|v?(F^\ a) — <p{Gy a)| < Go ^ sup |F/ {zi) - Gi{zi)\, Go > 0 a constant, 

t=i 

where G(-) is a (“near” to -dimensional distribution function with 

the marginals Gt(), i = 1, . . . , Furthermore, under rather general asump- 
tions on the objective function according to [7] we can obtain the “best” 
possible convergence rate for empirical estimates of the optimal value (in the 
case of independent and some types of weak dependent random samples) and 
“nearly best” rate of convergence for some others types of weak dependent 
random samples. 

3.3 Stochastic Programming Problems with Joint 
Probability Constraints 

To investigate the stability of the chance constrained stochastic programming 
problems we (again) restrict to the case si = /i and, moreover, to the relation 
(18). Evidently, in this case 

i = 1, 2, . . . , h} > a}. 

To introduce the suitable system of the assumptions we first define for e > 0, 
/? € (a — e, a), a € (0, 1), G Ei^ the following sets: 

{p) = {z^ G E\^ : Pp^i {w : componentwise} = /?}, 

{w : componentwise} > /?}, 

^i(z^) = G : //(a:^) < z/, i = 1, li, z^ = (z}, z/J} 

and by the symbol (/?, 6) we denote the ^-neighbourhood of 
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11.1 Pp^i (•) is absolutely continuous with respect to the Lebesgue measure 
in E/j. We denote by h{z^) the corresponding probability density. 

11.2 there exists a constant > 0 and for every /? G {a — a), G 

(/?) a point 6 (/?), - ^^|| < \/2T(|7)’^ such that 

7?i < h{z^) forz^ G Z~^,{I3, z\ S'), S' > 2y/h(§f)K 

= {z^ €Ei, €B(S'), 

z^ < z^ componentwise}, 

(B{6 ) denotes 6- neighbourhood of 0 G 

11.3 there exists a constant Lp^i > 0 such that for P E {a — e, a e) 

z^i G Zpii (/?, s'), s' > componentwise => 

> Lp^i\\z^ -z^\\. 

: z^ < componentwise}. 

The next two assertions express a (mathematical underlying) relationship 
between the stability of the deterministic set (corresponding to the inner 
problem of two-stage stochastic programming problems) and the “stability” 
constraints in the chance constrained case. 

Lemma 2. Let £ > 0, P E {o( — e, a, ), a G (0, 1) be arbitrary, > 0. Let, 
moreover, C be a real-valued constant such that 

A[^^(2*(1)), /C*(z^( 2))] < c'||z*(l)-z^(2)|| for z^(l), z^(2) G Zpii{/3, 6). 
If 

1. the assumptions ii.l and ii.2 are fulfilled, 6 >2Vhi^)l then 
A[Xf,.(a-e), (« + £)] < 2C'A^s/h, 

Vi 



2. the assumptions ii.l and ii.3 are fulfilled, S > — , then 






(a — £“), Xp^\ (a -f e)] < 2(7 






Proof. First, we shall deal with the case 1. To this end, let 

e), z^ = {z\, Z2, Zj^) be arbitrary. According to the assumption ii. 2 

there exist points z, z \ z E — e) such that \\z — z^\\ < y/h{^)'^ , 
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z' = z'(z) = z'i = Z,-{^)^, llz-z'll = 

\\z' — z^W < 2y/h{^)^ . And, moreover, 

Pjp^i {(jJ z < componentwise} > Pp^i z < componentwise} 

+P^^i {(j^ z < < z componentwise} >»-*£: + 



Consequently, z e Zp^l (a-f £). Since, evidently, (/^) C Zp(^ \ (/? ) for 
every /? < /?, /?, /? G (0, 1) we can see that 

A[Zp^i{a-e), Zp(i(a + £)] < 2\/^(^)'T. 

However, according to the last inequality and to the fact that that for /? G 

( 0 , 1 ) 

e e Zp^i{a) 

the following implication follows from the assumptions 

x^(l) G Xp^i (a — e) => there exist x^{2) G Xp^i {a + e) such that 



Since Xp^i (a + e) C Xp^i {a — e) the first assertion of Lemma 2 is valid. 

The proof of the second assertion of Lemma 2 is very similar and, conse- 
quently, we omit it. 

To introduce the next assertion, let 

U{F^\a) = {z^ £Ei, :F«‘’‘=(z')e («-£,« + £}}. 

Lemma 3. Let a, e > 0, a - e, a + e G (0, 1). Let, moreover, Xpt,i (a - 
e), Xp^i (a + e) be nonempty, compact sets. If ii.l is fulfilled and if 

1. A[Xp^i {a — e), Xp^i {a e)] < d for some d > 0, 

2. G{z) is an arbitrary /i -dimensional distribution function such that 

eU{F^\a), 

then 

A[Xp,i{a),XG{a)]<d. 

{G^{z^) = PgW ' componentwise}.) 

Proof. First, we shall prove the validity of the relation 

Xpo (a + e) C Xoia) C Xp^i {a - e). 



(26) 
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(28) 



Evidently, to prove the last relation it is sufficient to prove 

Pg{<^ • componentwise} — E => 

Pp^i {u : componentwise } G (of — e, a -|- e). (27) 

We shall prove the relation (27) by the contradiction. To this end we shall 
assume that there exists G Zp^i such that simultanously 

Pg{^ * z^ < ^^(a;) componentwise) = a and 
PpO • z^ < ^^(a;) componentwise) ^ (a — e, a -i- e). 

Of course then, just one of the following assertions must hold 

a. Pp^i {u : z^ < ^^(c*;) componentwise) < a — e, 

b. Pp^i {uj \ z^ < ^^(a;) componentwise) > a -\-e. 

Let us, first, to consider the case a. It means that 

Pg{^ z^ ^ ^H^) componentwise) = ot and simultaneously 
Pp^i {uj : z^ < ^^(u^) componentwise }< a — e. 

It follows from the assumptions that then there exists z^ G P/i, z^ < z^ 
componentwise z^ such that 

PpO £ ^H^) componentwise } = a — e. 

However then, it follows from the assumptions that 

Pg{^ • < ^n^) componentwise) > a — and simultanously 

Pg{u; : z^ < ^^(w) componentwise) < a — ^e. 



(29) 



Since z^ < z^ componentwise z^ the last inequality is in the contra- 
diction to the first relation in (28). It remains to consider the case b. Since 
the corresponding proof is very similar to the first one we can it omit. Con- 
sequently, we have proven the relation (26). Since, furthermore, evidently 

d" ^) C ^ pO (^) C ^ pO ““ ^)) 

the assertion of Lemma 3 follows already immediately from the properties of 
the Hausdorff distance. 

If we substitute n := rii, / := /i, 7 := IC*{y) := f*{x) := 

fi(x^) :=, z = 1, . . ., l\ in i.l, i.2, i.3, then we can obtain: 

Proposition 4. Let a G (0, 1), e > 0, a — a + e E (0, 1). Let, moreover, 
relation (18) be fulfilled and Xp^i (a+e), Xp^i (a— £:) be nonempty, compact 
sets. If G(z) is an arbitrary /i -dimensional distribution function such that 

sup|F«'(zi) -G(z^)| < 

and, moreover, at least one of the assumptions i.l, i.2, i.3 is fulfilled, then 
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1. the assumptions ii.l and ii.2 imply 



A[X^,.(a), Xg(cv)]<4C 



4snp\F^\z^)-G{z^)\ 



Vh 



2. the assumptions (ii.l) and (ii.2) imply 

A[X,..(.), Xa(a)] < 

Proof. First, it follows from the relation (18) and Lemma 1 that 

A[^(z'(l)), /C(z'(2))] < C\\z\\) - z\2)\\ for every ;.'(!), z'(2) G Z'. 

The assertion of Proposition 4 follows then from the last relation, from Lem- 
ma 2 and Lemma 3 and the properties of probability measures. 

We can see that (completing the problem by some assumptions on the 
objective function and the probability measure) we can obtain 

|^(F^ , a) — <^(G, a)\ < Co(sup \F^ {z^) - G(2r^)|)^, Co > 0 a constant. 

In the case when the assumptions ii.l and ii.3 are fulfilled, then (under some 
special additional assumptions) we can also obtain 

\(p{F^ , a) — (p{G, a)\ < Co(sup \F^ {z^) — G(z^)|), Cq>0 a constant. 

and, furthermorem according to [7] under rather general assumptions on the 
objective function we can obtain also the “best” possible convergence rate for 
empirical estimates of the optimal value (in the case of independent and some 
weak dependent random samples) and “nearly best” rate of convergence for 
some others types of weak dependent random samples. 
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Approximation of Extremum Problems 
with Probability Cost Functionals 



Riho Lepp ^ 

Institute of Cybernetics, Akadeemia 21, EE-0026 Tallinn, Estonia 



Abstract. The problem of maximization of the probability of a reliability level 
under integrable decision rules is approximated by a sequence of finite dimen- 
sional problems with discrete distributions. Convergence conditions of the ap- 
proximation are presented. 

Keywords. Probability functional, finite dimensional approximation, discrete 
convergence 



1 Introduction and Problem Formulation 

Usually the solution of a stochastic program is considered as a deterministic 
vector. This is the well known approach in one- and two-stage programming. 
Still, there exist models, in which the solution is determined as a function from 
the random parameter. In the latter case the class of solution functions (decision 
rules) should be described earlier (as measurable or summable or continuous 
or linear or constant functions, see, e.g., [5]). Even more, two-stage stochastic 
programs are equivalent to an extremum problem in a function space (see, e.g., 
[3] - the equivalence of linear programs in [10] - the equivalence of convex 
ones in L^). 

To ensure a certain level of reliability for the solution it has become a spread 
approach to introduce probabilistic (chance) constraints into the model. The 
stability analysis of chance constraint problems is rather complicated due to 
uncomfortable properties of the probability function 

u{x) = P{s I f{x,s) < t}. (1) 

Here f{x,s) is a real valued function, defined on x / is a fixed level of 
reliability, 5 is a random vector and P denotes the probability. Note that the 
function u{x) is never convex, only in some cases (e.g., f{x^s) linear in s and 
distribution of s normal), it is quasiconvex, e.t.c. Varied examples and models 
with probability function u{x) and its ’’inverse,” the quantile function 

mm{/ 1 P{s\ f{x^s) < t} > a}, 0 < a < 1. 

^also, Estonian National Defence and Public Service Academy, 61 Kase Street, EE-0020, 
Tallinn, Estonia 
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are presented in [7], Ch. I. In Ch. II the authors present some models with 
such a complicated structure that we are forced to look for a solution x from 
a certain class of strategies, that means, the solution x itself depends on the 
random parameter s, x = x{s). 

This class of probalility functions was introduced to stochastic programming 
by Ernst Raik, and lower semicontinuity and continuity properties in Lebesgue 

-spaces were studied in [9]. 

In this paper we will consider the maximization of the probability functional 
t;(x) : 

max v(ar) = max P{s | f(x(s),s) < t], (2) 

wC t C 37 1 C 

where / : x W B} is 3. fixed level of reliability, s is a random parameter 

with bounded support S, s G S C and with atomless distribution cr, 

(t{s I I 5 - So I = const} = 0 Vsq G R^ . (3) 

Due to technical reasons it will be assumed that the decision rule x{s) will be 
a a- integrable function, x G T^(5, E,cr) = T^(cr), and C will be a compact 
constraint set in 

Since the problem (3) is formulated in the function space T^(cr) of cr-inte- 
grable functions, the first step in its approximate solution is the approximation 
step - replacing the initial problem (2) by a sequence of finite dimensional op- 
timization problems with increasing dimension. 

We will approximate the initial measure cr by a sequence {(m„, s„)} of dis- 
crete measures which converge weakly to cr. The usage of the weak convergence 
of discrete measures in stochastic programming has its disadvantages and ad- 
vantages. An example in [11] shows that, in general, the stability of a probability 
function with respect to weak convergence cannot be expected without additional 
smoothness assumptions on the measure cr. This is one of the reasons, why we 
should use only continuous measures with the property (3). An advantage of the 
usage of the weak convergence is that it allows us to apply in the approximation 
instead of conditional means the more simple, grid point approximation scheme. 

Since the functional v(x) is not convex, we are not able to exploit in the 
discrete stability analysis of the problem (1) the more convenient, weak topology, 
but only the strong (norm) topology. As the first step we in this paper will 
approximate i;(x) so, that the discrete analogue of the continuous convergence 
of the sequence of approximate functionals will be guaranteed. 

Schemes of stability analysis (e.g., finite dimensional approximations) of ex- 
tremum problems in Banach spaces require from the sequence of solutions of 
’’approximate” problems certain kind of compactness. Assuming that the con- 
straint set C is compact in we, as the second step, will approximate the 

set (7 by a sequence of finite dimensional sets {Cn} with increasing dimension 
so, that the sequence of solutions of approximate problems is compact in a cer- 
tain (discrete convergence) sense in Then the approximation scheme for 

the discrete approximation of (2) will follow formed schemes of approximation 
of extremum problems in Banach spaces, see e.g. [2], [4], [13], [8]. 
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Redefine the functional v{x) by using the Heaviside zero - one function x • 



where 



^ “ fix{s),s))cr{ds), 



( 4 ) 



In the next two sections we will assume that the function /(ar, s) is continuous 
in (x,5) and satisfies following growth and ’’platform” conditions: 



I I < «(«) + I ^ L a ^ > 0. (5) 

a{s I f{x, s) = const} = 0 V(a:, s) E x S. (6) 

The continuity assumption is technical in order to simplify the description of the 
approximation scheme below. The growth condition (5) is essential: without it 
the superposition operator f (x) = f{x{s),s) will not map an element from to 
(is even not defined). Condition (6) means that the function f{x,s) should 
not have horizontal platforms with positive measure. 

Constraint set C is assumed to be a set of integrable functions x G 
with properties 



L 



x(s) I cr{ds) < M < oo M X EC 



( 7 ) 



for some M > 0 (C is bounded in L^(cr)); 



L 



x{s) I < K(t(D) 



\/xeC and VDgE 



( 8 ) 



for some K > 0; 



{x{s) — x{t), s — t) > 0 for almost all (a. a.) s,t E S (9) 

(functions x G C are monotone almost everywhere). 

Conditions (7), (8) guarantee that the set C is weakly compact (i.e., compact 
in the - topology, see, e.g., [6], Ch. 9.1.2). Condition (9) guarantees 

now, following [1], Lemma 3, that together with conditions (7), (8) the set C is 
strongly compact in {a). Then, following [9], we can conclude that assumptions 
(5) - (9) together with atomless assumption of the measure a guarantee the 
existence of a solution of problem (2) in the Banach space L^{cr) of <j-integrable 
functions (the cost functional i;(x) is continuous and the constraint set C is 
compact in L^{a)). 

In the last section we will approximate an unconstrained maximum of the 
probability functional (4), where decision rule x{s) will be an integrable with 
p-ih power function, x £ (a), 1 < p < oo, and the probability functional i;(a:?) 

will vanish in the infinity. 
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2 Discrete Approximation in i7-spaces 

Let the initial probability measure a be approximated by a sequence of discrete 
measures with weights rriin at points i — n E N = 

{1, 2, 3, ...} (weak convergence of discrete measures): 

h{sin)min 

for any continuous on 5 function h{s). 

Since approximate problems will be defined in , we in order to analyze sta- 
bility of approximations should define a connection operator between spaces 
L^(cr) and . In L^-spaces, 1 < p < oo, systems of connection operators 
V = {pn} should be defined in a piecewise integral form (as conditional means): 

{PnX)in=Or{Ain)~^ x{s)a{ds), (11) 

JAi„ 

Here sets Ain, « = 1, n, n G iV, that define connection operators (11), satisfy 
following conditions Al) - A7): 

Al) a(Ain) > 0; 

A2) Ain Ajn — 0j ^ 2 1 

A3) = 5; 

A4) Er=i I “ <^{Ain) I 0, n 6 iV; 

A5) max* diam Ain 0, n £ N; 

A6) Sin ^ Ain 5 

A7) a{int Ain) = (r{Ain) = (t{cI Ain), 

where and ”c/” denote topological interior and closure of a set, respectively. 

Remark 1 Weak convergence (10) is equivalent to the partition {An} of S, 
An = [Ain, ...,Ann}, properties Al) - A7), see [12]. 

Remark 2 Collection of sets {Ain} 'with the property A 7) constitutes an algebra 
Eo C S, and if S = [0,1] and if a is Lebesgue measure on [0, 1], then integrability 
relative to a means Riemann integrability. 

Define now the discrete convergence for the space L^(cr) of cr-integrable func- 
tions. 

Definition 1 Sequence of vectors {x„}, G R^^ , V-converges (or converges 
discretely) to integrable function ic(s), if 

n 

I Xin - {PnX)in \ m,„ ->-0, n £ N. 

i = l 




I h{s)(r{ds), 

Js 



neN, 



( 12 ) 
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Remark 3 Note that in the space L^{(t) of a -integrable functions we are also 
able to use the projection methods approach, defining convergence of {xn} to 
a: ( 5 ) as follows: 



I 



- E XinXAi„[s) 
1 = 1 



a{ds) — > 0, n E N. 



Remark 4 Projection methods approach does not work in the space L^{(t) of 
essentially bounded measurable functions with vraisup-norm topology. 



Define also the discrete analogue of the weak convergence in 

Definition 2 Sequence of vectors {ar„}, E , n £ N, wV-converges (or 
converges weakly discretely) to integrable function x{s), x G if 



n 

'^{zin,Xin)min / (^(s) , ar(s))cr(ds) , neN, 

i=i 



( 13 ) 



for any sequence {zn} of vectors, Zn G n £ N, and function z{s), z G 

L^[a), such that 



max I Zin - {Pnz)in \ -^ 0, u £ N. (14) 

l<x<n 

Here the space (<t) of essentially bounded measurable functions is topological 
dual of L^ic). 

Remark 5 For piecewise integral connection system (11) discrete weak conver- 
gence (13) is equivalent to the convergence 

n 

i(PnZ)in,^in)mi„ -A 

i = l 

fsee, [13]). 

In order to formulate the discretized problem and to simplify the presentation, 
we will assume that in partition {^n} of S, where An = {^in, with 

properties Al) - A7), in property A4) we will identify m,n and cr(Ai„), i.e. 
niin = cr(Ain) (o.g. squares with decreasing diagonal in R^). 

Discretize now the probability functional t;(x) : 

n 

'Vn{Xn) = ( 15 ) 

i=l 

and formulate the discretized problem: 




l 



{z{s), x{s))(r{ds), n£N \/z£L^{a) 



max Vn[Xn) 

Xr,£Cn 



n 

X] “ f{^in,Sin))min, 
1=1 



max 

^nec„ 



(16) 
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where constraint set Cn will satisfy discrete analogues of conditions (7) - (9), 
covered to the set C : 



X] I I ^ ^ '^XneCn, (17) 

i=l 

^ ^ I ^in I ^in ^ ^ ^ '^in ^ ^ Cn ^ In C. {1, 2, 7l}, (1^) 

iein iein 

r 

- ^jkn){^k - jk) > 0 y ikjk, S.t. ik < jk, (19) 

k = l 

and such that 0 < ik,jk < ^ 'in E N. 

Definition 3 Sequence of sets {Cn}, Cn C n E N, converges to the set 

C C in the discrete Mosco sense, if 

1) for any subsequence {xn}, n E C N, such that Xn E Cn, from conver- 
gence wV — Urn Xn — x it follows that x E C; 

2) for any x E C there exists a sequence {x„}, Xn E Cn, which V -converges 
to X, V — Urn Xn = x, n E N, 

Remark 6 If in Definition 3 also ”for any” part 1) is defined for V- convergence 
of vectors, then it is said that sequence of sets {Cn} converges to the set C in 
the discrete Painleve-Kuratowski sense. 

3 Conditions of the Discrete Approximation 

Denote optimal values and optimal solutions of problems (2) and (16) by t;*, x* 
and V*, X*, respectively. 

Proposition 1 Let function f{x,s) be continuous in both variables (x,s) and 
satisfy growth and platform conditions (5) and (6). Then from convergence 
V — lim Xn = X, n E N for any monotone a. e. x{s), it follows convergence 
^n(a^n) -> v{x), nE N. 

Proof. Let V - Urn Xn = x, n E N, i.e., let I - {Pn^)in \ mn 
0, n E N. Divide the difference 

n » 

I D„(x„)-t;(a;) | = | - /(a:.n, Sin))"i,n - / x(^-/(a:(s),s))o-(cfs) | (20) 

i=i Js 



into two parts: 

I Vn{xn) - v(x) | < | u„(a:„) - Vn{p„x) | + | u„(p„a;) - v(x) | . (21) 

Estimate the first difference | Vn(xn) — Vn(Pn^) | • Let us show that for any 
small e > 0 there exists an index Ni, such that | Vn(xn) — Vn(Pn^) | < ^ for 
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all n > A^i. In the space L^{cr) of cr-integrable functions we can approximate an 
integrable function by a continuous function. Approximate first the discontinuous 
integrand x(^ ~ ^)) by continuous one. Define the continuous function Xc in 

the following way: 



1, if f(x,s)<t, 

Xc{t - = { 1 -<], ift < f{x,s) <t + 5, 

0, if f{x,s) > t + S 



where S > 0 is chosen in such a way, that 



/ 

J{s 



{s\t<f{x($),s)<t+S] 

Then | - t;„(p„x) \ = 



a-(ds) < e/6. 



( 22 ) 



= I “ fi^in,Sin))min f{{PnX)in , Si„))min \ < 

i=l 1=1 

n n 

^ I X] ~ /(^.n, S.n))ni.„ - ^ Xc{t - fixin,Sin))mi„ \ + 
i = 1 1=1 

n n 

+ I Xc{i — f{ ^in j ^in ))min ~'^Xc(t - fi{PnX)in,Sin))mi„ \ + 



i=l 



i = l 



+ I X ~ f{(PnX)in,Sin))mi„ ~ X “ fi{PnX)in, Sin))min \ . 

1=1 i=l 

Due to choise of J > 0 in (22) we can find indices rii and 7i2, such that 

n n 

I X^(^ ~ /(*in,Sm))m,n -^Xc(t - /(xj„ , s,„))mi„ | < e/3 OS n > ni 



1 = 1 



i=l 



and 






1=1 1=1 
Estimate the second difference: 

n n 



^Xc[t - f{xin,Sin))mi„ -Y^Xc{t - /((p„x)i„ , Si„))m,„ I < 



1=1 



i=l 



n 

< X I ~ f{Xin,Sin)) - xS - fi(PnX)in,Sin)) \ rtlin- 

i=l 
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Function Xc{t — f{x,s)) is uniformly continuous on (X, 5) (as a continuous 
function on a bounded set), so, taking for a small 5 > 0 a z/ > 0, such that for 
all n > Tis from | Xin — {pn^)in I < it follows 

\ Xc{i ~ f {^in j ~ Xc{^ ~ f {{Pn^)ini ^in)) | ^/3, 



we get 



^ I Xc(t - f{Xin,Sin)) - Xc(t - f{{PnX)in, Sin)) \ rUin < s/Z. 
i-l 

Consequently, for all n> N\ — max{ni^n 2 ^ na} we have 

I Vn{Xn) - Vn{PnX) | < 5. 

Estimate second difference | Vn{pn^) — I of the sum (21). Take a con- 
tinuous monotone function a:c(5), Xc G C'(S'), such that | z;(ic) — v{xc) \ < e/A 
and estimate difference | v[x) — Vn{pn^) \ as follows: 

I v{x) - Vn{PnX) | < 

< I z;(x) - i;(a:c) I + \ v{Xc) - Vn{PnXc) \ + \ Vn{pnXc) ~ Vn{PnX) \ • 

By assumptions {f{x, s) continuous in (ar, s), assumption (6) and a atomless) we 
have (r{St) = 0, where 5^ = {s | f{xc{s),s) = t}. Then 

I v{Xc) - VniPnXc) \ < \ v{Xc) ~ Vn{p'r^Xc) | + | ^n(Pn^c) “ ^^n(Pn^c) |, 

where 

(Pn^c)in — ^c(^in)- (^^) 

Note that for the continuous function Xc(s) we are able in parallel to the piecewise 
integral connection system V = {pn} to use the simpliest connection system 
P' {Pn} of fhe form (23). Systems V and V' are equivalent in the following 
sense: 



n 

I 

2 = 1 

Now I Vn(PnXc) ~ Vn{PnXc) \ = 

n n 

= I “ fi^c{sin),Sin))mi„ - ^x(< - f{{Pn3:c)in,Sin))mi„ \ < e/8 

2=1 2=1 

as n > 714 for some 714 (the function F(s) = — f{xc{s)j s)) has only the first 

kind discontinuity and the cr-measure of its discontinuity points is zero) . 

Take now an index 71 5 so large that for all ti > 715 we have 




/ 

JAi„ 



Xc{s)a{ds) - Xc{si„) I mi 



0, n€N. 



I v{Xc) - Vn{p'r,X^) | = 




177 



f " 

= l / x(t - f(Xc{s),s))(T(ds) - f(Xc{Sin),Sin))min \ < e/8 

•’S i = i 

(approximation of a Riemann integrable function by Riemann sums). Then 
I v{xc) — Vn{Pn^c) I < ^/4, as 71 > max{n 4 ,Ti^}. Since continuous function 
a?c(5) was taken so that [ t;(x) — v{xc) \ < e/A, we have 

\Vn{PnXc) - Vn{Pn^) 1< e/2. 

for n > Tie. Consequently, for n > N 2 = max{ri 4 , n^jTie} we have 

I Vn{Pn3^) - v{x) I < ^ 

and hence, for n > max{Ni, N 2 } 

I Vn{Xn) - V{x) I < 2e. 



Proposition is proved. 

Denote sets that satisfy only conditions (7), (8) and (17), (18) by C' and (7^, 
respectively. 

Proposition 2 Let constraint sets C' and C!^ satisfy conditions (7), (8) and 
(17), (18), respectively. Let discrete measures {(t?i„,s„)} converge weakly to the 
measure cr. Then sequence of sets {(7^} converges to the set C' in the discrete 
Mosco sense. 

Proof. Consider ’’for any” part 1) of the Definition 3. Let wV — lim Xn = 
X, n £ N, i.e., let 

I ^{{pnz)in, ^in)min ~ / (^(s) , | ^- 0, Tl £ N ^ z £ L°° , (24) 

7^1 Js 

and prove that x £ C. Test condition (7). Assume that 

n 

^ ^ I ^in I '^in ^ ^ 

2 = 1 

for all n E N, Now, assuming in contrary that there exists a (S > 0 such that 
Is I 1 ^ + (J, we get contradiction M > M -j- 8 (the pair of 

functionals (|| x ||,{|| Xn ||n}) is weakly discretely lower semicontinuous, i.e., if 
the convergence (24) holds, then 

lim inf I | min > / | a;(s) I aids) 

"€JV ^ Js 



see, e.g., [13]). 
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Consider condition (8). Assume that 

XI I I m,„ V /„ C {1, 2, 3, n} V n G AT, (25) 

* € /n * € 7 n 



and that wV — lim Xn = x, n E N. Then from (24) and (25) we get 




^in 



< a: lim 
iei 



^ j 

JS 






Define the sequence of piecewise constant integrable functions x„ G 
taking 

= X I I 



Now 

lijn / Xl?(«)2?n(5)cr(ds) = / XD{s)x{s)a{ds) 

Js 

(sequence as a sequence of piecewise constant cr-integrable functions con- 
verges weakly to the cr-integrable function x{s) since xd G T"^(cr)). Con- 
sequently, 



lim sup 

n£N 




XA.As)XD{s)(T{ds) = / 
Js 



Xd(«) I «(s) I (r(rfs) = 



= / I a;(s) I <T(ds) < K lim sup / X XA.„{s)XD{s)<T(ds) = Kct{D). 

JD n€N Jg 

We can conclude now that weak discrete limit points of the sequence satisfy 

conditions (7) and (8). 

Consider ’’there exists” part 2) of Definition 3. Take an x G C' that satisfies 
(7) and (8) and take x,„-s in the form: x,„ = (p„x)i„ = cr(A,n)"^ x(s)cr(ds). 
Consider condition (17): 



E 

» = 1 



min 



= X I / x(s)cr((/s) I m,„ = 

1 = 1 



= X I / I ^ X / I I ^ 

i = l d Ain i — i J Ain 

Consider condition (18): fix an /„ C {1, 2, n}. Then 

X I I = X I {PnX)inmin = 

*€/n » 6 /n 
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= Y] I ^ / x(s)(T((is) I (r{Ai„) < XI / I I ” 

»€/„ 

since 7l,n H Ajn = 0, * / i- Now 

lim [ Y X^.„(s) I 2^(«) I = f I 2;(s) I o-(ds) < Ko-(D) 
n€NJs^^^ Jd 

for some D E S. But 

Kcr{D) = K / <r(ds) = A' lim / X] XA.„(s)(r(rfs) = A lim X] ^in 

/n fl£N Ic n^Iy , 

iG/n *€/n 

since it was assumed that (r[Ain) — ruin- Proposition is proved. 

Proposition 3 Let constraint sets C and Cn satisfy conditions (7) - (9) and 
(17) - (19), respectively. Let discrete measures {(m„,5„)} converge weakly to 
the measure a. Then sequence of sets {Cn} converges to the set C in the discrete 
Painleve-Kuratowski sense. 

Proof. Let V - lim Xn = x, n e N. Consider condition (9) with its discrete 
analogue (19). Let Xn G Cn, i*e., 

r 

-jk)>0 V ikjk, such that ik < jk, k=l,...,r, 

k = l 
and 

0 < ik, jk < n, ne N 

(remember that x„ = {3:in,^2n, ■■■,^nn) and each Xin,i = is an r- 

dimensional vector, Xin = (^}n> ^fnr Dcfiiic a piecewise constant func- 

tion (r„x„)(s) in the following way: 

(r„x,„)(s) = (x}„,xf„,...,x^„), ifsk e [(ik-l)/n,ik/n) Vfc G {l,2,...,r}. (26) 
Then 

((?"n^n)(^m) “ {f*nXn){^jn) , ^in ~ ^jn) ^ 0 

and therefore, 

{{rnXn){s) - {rnXn){i),S-t) > 0, 

where s = (si,...,Sr) and t = {ti,...,tr) are defined by (26). Assume that the 
P-limit of (discrete) monotone vectors Xn is not nondecreasing on a set A G S 
with a positive measure, cr(A) > 0, and construct a contradiction. 
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Assuming contrary, we get 

(;r(s) -x(^),s-^) < -S<0 Vs,/gA, (27) 

but 

{{rnXn){s) - {rnXn){t),s -t) > 0 fova.a.s^teA, (28) 

From assumption V — lim Xn = x, n E N, we get 

f ” 

/ I ^(«) - Y] ^inXAi^is) I (T{ds) -A-0, n£ N. 

Js i=i 

We can extract from sequence a?„(s) = XinXA^ni^)^ n G A', a subsequence, 
which converges almost surely to zero, and consequently, 

3?n(^) — >■ 3?(s), n G A' C A, almost surely. 

Now inequality (27) together with inequality (28) give us a contradiction. Con- 
sequently, the limit of the (discrete) monotone sequence of vectors is an 

almost everywhere monotone function x{s). 

Let now a:(s) be an almost everywhere monotone function, 

(x{s) — x(t), s — t) > 0, a. e. 

Define Xin via projectors pn : 

Xin - {Pnx)in = a{Ain)~^ / x{s)a{ds), i = 1, Tl, n€ N, 

JAi„ 

where A,n-s are parallelepipeds with sides [(i/c — l)/n, z/j/n], k = 1, ..., r. Then 

r r 

- ^j„n){ik/n - jk/n) = J2i(PnX)^,n ~ {Pnx)'^,„){ik /n - jk/n). 

k=l k=l 

Define, similarly to (26), piecewise constant functions (r„(p„ar)„)(s) 

= (^in>^in> •••> ^in), if = (*fe “ 1)/”. = 1, r. Then we get 

((rn(Pna;)„)(s) - (r„(p„a;)„)(t),s-<) >0 Vs,t€5. 

Consequently, the ’’there exists” part of discrete convergence Cn C, n G A, 
is also verified. Now, together with Proposition 2 we can conclude that sequence 
of sets {Cn} converges to the set C of admissible solutions of the initial problem 
(7) - (9) in the discrete Painleve-Kuratowski sense. Proposition is proved. 

Relying on Propositions 1 - 3, we can now formulate and prove the main 
result of the paper. 
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Theorem 1 Let function f{x,s) be continuous in both variables {x^s) and sat- 
‘Isfy growth and platform conditions (5) and (6), constraint set C satisfy con- 
ditions (7) - (9) and let discrete measures {(mn,s„)} converge weakly to the 
initial measure a. Then v^ — > v*, n E and sequence of solutions of 

approximate problems (16) has a subsequence, which converges discretely to a 
solution of the initial problem (2). 

Proof. By assumptions and by Proposition 3 sequence of solutions {ar* } of 
approximate problems (16) is discretely compact, i.e., V — lim ar* = ar, n G 
N' C N and by Proposition 1 lim t’n(^n) — ^ ^ Then clearly 

v* > v{x) = lim ^ ^ Ll' 

(by Proposition 3 limit point x is admissible but could not be optimal) . 

Prove the opposite inequality. By Proposition 3 for any y E C there exists 
a sequence {yn}j yn ^ such that V — lim yn = y. Consider difference 
Vn{x*^) — i;(ar*). By Proposition 1 Vn(yn) -> ?^(a!*) if yn (continuous 

convergence of discrete approximations). Then 

limsup v’^ = limsup Vn{x^) > limsup Vn{yn) = ?^(a:*) = v* , n^N. 

Consequently, 

v* — lim v^ — lim i;(ar* ), n E N. (29) 

We can get convergence of a subsequence of solutions of approximate problems 
to a solution of the initial problem assuming in contrary: let P — /im ar* = ar, 
n E N' C N, and assume t’(ar) ^ v* . Now we get the contradiction with (29) 
together with continuous convergence of a sequence of approximate functionals 
{^n(^n)} to probability functional i;(a?) (Proposition 1). Theorem is proved. 

Remark 7 The usage of the space T^(cr) of integrable functions is essential. In 
reflexive -spaces, 1 < p < oo, serious difficulties arise with application of the 
strong (norm) compactness criterion for a maximizing sequence. We will see it 
in the next section. 



4 Approximation of Unconstrained Problems 

Approximation of an unconstrained maximization of the probability functional 
v{x) has its advantages and disadvantages. An advantage is that we do not 
need to approximate a constraint set. But now we should guarantee discrete 
compactness of a sequence of solutions of approximate problems itself. This 
disadvantage brings along some restrictions to functional v[x) and to space . 
We will assume that i;(a?) is vanishing in infinity and that solution a?(s) belongs 
to a reflexive L^-space, 1 < p < oo, since in reflexive spaces a bounded sequence 
is weakly compact. 
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Formulate the problem: 



max vix) 

x^LP{a) 



( 30 ) 



where 1 < p < oo. In order to guarantee that -> we instead of (5) 

should assume 



I 5 ) I < + a I X 1^, a E a > 0. (31) 

Discrete convergence V — lim Xn = x,n E N, is in L^-spaces defined as follows 

n 

I ^in - {Pn^)in f ^0, n E N, 

1 = 1 

and weak discrete convergence, wV — lim Xn = x,n E N, similarly: 

n 

injXifi^min y 

i=l 

where l/p+ 1/^ = 1. 

Formulate the approximate problem: 

max Vn{xn), (32) 




L 



( 2 :(s), x(s))cr(ds), n E N Vz E i^(<j), 



where Xn E 

Proposition 4 Let continuous in both variables (a:,5) function f{xjs) satisfy 
growth condition (31) and platform condition (6). Then from convergence V — 
lim Xn = XjU E Nj it follows convergence Vn{xn) ^(^), n E N. 

The proof proposition is quite sumilar to the proof of Proposition 1 and will be 
omitted here. 

From now we will assume that the probability functional v(a:) will vanish in 
the infinity: 

v[xn) 0, as II II oo, n E N. 

Definition 4 Sequence {i^n(^n)} of approximate functionals is uniformly van- 
ishing in infinity, if from 



lim inf \\ Xn \\n = oo 
neN 



it follows 



lim sup Vn{Xn) = 0. 

nEN 

Let X* and a:* be solutions of problems (30) and (32), respectively. 



(33) 



( 34 ) 
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Theorem 2 Let 

1) function f(XjS) be continuous in both variables {x,s) and satisfy growth 
and platform conditions (29) and (6); 

2) functionals Vn{xn), n £ N, vanish uniformly in the infinity; 

S) discrete measures converge weakly to the initial measure a. 

4) norms || x* ||„ of solutions of approximate problems (32) be bounded with 
norm || x || of their weak discrete limit point x, 

limsup II x*„ ||„ < II X II, n e N, 

where x = wV — lim x"^,n £ N. Then 

v^ -> v*, ne N, 

and sequence of solutions {x*} of approximate problems (32) has a subsequence, 
which converges discretely to a solution of the initial problem (30). 

Proof. Let {a:*} be a sequence of solutions of approximate problems (32). 
Due to uniform vanishing assumption 2) it is bounded and hence, weakly dis- 
cretely compact, i.e., for some integrable with the p-th power function x(s), 
X G L^{a), the convergence 

n . 

T]i{Pnz)in,Xin)min / (z(s) , x(s))(r(rfs) , n € N Vz G L'>{a) 

i-l 

holds, where 1/p-hl/^ = 1. Since the L^-norm is a convex functional, it is weakly 
lower semicontinuous. In our, discrete approximation case, it means that 

II a? II < liminf || ar* ||„, n e N' C N, 

and together with assumption 4) we get 

lim II a?; ||„ = II a? II, n G N' . 

In reflexive L^-spaces the discrete analogue of Radon-Riesz property holds, i.e., 
if wV — lim X* = X and || a?* ||„ -> || ar ||, n G N' , then 

II a?; - Pnx \\n 0, ne N'. 



Consequently, 



v* < v(x) = /im i;„(ar*) = v’^, n E N' C N 

(a limit point x could not be optimal yet). 

Prove the opposite inequality. Consider the difference i^n(^n) — ^^(a:*). By 
definition of the discrete convergence V — lim pnX* = x* , n E N. Hence, 



Vn{x^) - V{x'‘) < VniPnX’") ~ v{x*) < 6 
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for a sufficiently small e and for n sufficiently large. Consequently, 

V* < v{x) < liminf Vnix"!^) < limsup — v* . 



Hence, 

lim i;* = V* j n E N. 

Discrete convergence of a subsequence {a:* } of solutions of approximate prob- 
lems (32) to a solution x* of the initial problem (30) follows now from the ob- 
servation that from sequence {a:*}, n E N" = N/N' , we can again separate 
converging (to some a?) subsequence 

V* < i;(a?) < liminf i?* < limsup v* < v* , n E N'" C N/N'. 

Discrete convergence of a subsequence of solutions of approximate problems (32) 
to an optimal solution of the initial problem (30) follows now from the discrete 
analogue of the Radon -Riesz property. Theorem is proved. 

The author expresses his gratitude to the referee for helpful remarks and 
suggestions. 
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Global Optimization of Probabilities by the 
Stochastic Branch and Bound Method 



Vladimir Norkin 

Glushkov Institute of Cybernetics, 252207 Kiev, Ukraine 



Abstract. In this paper we extend the Stochastic Branch and Bound 
Method, developed in [7], [8] for stochastic integer and global opti- 
mization problems, to optimization problems with stochastic (expec- 
tation or chance) constraints. As examples we solve a problem of 
optimization of probabilities and a chance constrained programming 
problem with discrete decision variables. 

Key words. Branch and bound method, stochastic global optimiza- 
tion, optimization of probabilities, chance constrained programming 

1 Introduction 

The paper deals with a stochastic global optimization problem. The 
problem functions have the form of mathematical expectations or prob- 
abilities and depend on continuous and discrete variables. These vari- 
ables satisfy some, maybe nonlinear, constraints. To solve this problem 
we develop a certain version of the branch and bound method, which 
uses some specific stochastic bounds for optimal values of subproblems, 
generated by the method. As examples, we consider a problem of op- 
timization of probabilities (from pollution control area) and a chance 
constrained programming problem. 

Certainly, there is a broad literature on optimization of probabil- 
ities and chance constrained programming (see, for instance, [9], [2], 
[3], [5], [10], [6] and references therein), but most of these papers con- 
sider either conditions of (quasi)concavity of the probability functions 
or continuous local optimization problems. In the present paper we 
consider the problem of global optimization of probabilities. 

In the paper we extend the Stochastic Branch and Bound method, 
developed in [7], [8], to problems with stochastic (in particular, chance) 
constraints. 
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Formally, the problem under consideration has the form of finding 



the global maximum 




max[F(x) = E/(x,6>)], 


(1) 


subject to constraints 


Fk{x) - 


= Efk{x,e)<0, 


(2) 




X e xnD c R^, 


(3) 



where f{x, 6), fk{x, 0) are some nonconvex (for instance, quasi-convex) 
functions, is a finite set with a simple structure (for example, the in- 
tersection of some discrete lattice C and a parallelepiped /C in fl"), the 
set D = {x £ G{x) < 0} is given by some deterministic function 
G : jR" — > R^, E denotes the mathematical expectation with respect 
to a random variable 0, defined on some probability space (0, S, P). 

If the original problem has continuous decision variable x G Af, 
then we can make it discrete assuming that x belongs to some lattice 

C e R^. 

In this paper we are especially interested in the case where some 
of the functions Fk{x) have the form of the probability: 

Fk{x) = P{gk{x,0) e B}, (4) 

where gk '■ R”^ x Q — i?'" is a random vector function, 0 G 0 is a 
random variable, B C R”^. For instance, let 

Ffc(x) = P{ffA:(x, 6>) > 0} = Ex(5fc(a;, 6>)), 

where gk : R'^ x Q — > R^, x{t) = 1 for t > 0 and x(t) = 0 for t < 0. 
Then for a given 0 the function 

fk{-,0) = x{9k{-,0)) 

is nonconvex and discontinuous, and thus Fk{x) can be nonsmooth, 
noncvonvex or even discontinuous. 

To solve problem (1) - (3) we develop a special version of the 
stochastic branch and bound method, which takes into account nonlin- 
ear stochastic constraints (2). Besides, special stochastic lower bounds 
of the optimal value of problem (1) - (3), based on the interchange of 
the minimization and the expectation (or the probability) operators, 
are constructed. Earlier, similar bounds were used in [7] and [8] for 
solving some stochastic discrete and continuous global optimization 
problems. In the present paper we deal with probability objective 
functions (4) and stochastic (chance) constraints (2). 
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2 Illustrative Example: Pollution Control 

As a possible application we consider a pollution control problem (see, 
for example, [1], [4], [7]). 

In the simplest pollution control problem there are emission sources 
j = 1, . . . , n and receptor points i = 1, . . . , m. For every source j a set 
Xj of possible emission levels Xj is available. Each solution xj G Xj 
has the cost Cj{xj). The emissions are transferred to receptors and 
produce depositions 



m 

yi(x^0) — tjj (^) Xj , 

t=i 

where x = (a^i, . . . , Xm)^ some random transfer coefficients, 

0 is a random variable (weather conditions). For simplicity, we can 
assume that 6 takes on a finite number of values (scenarios). There are 
some target levels (ambient norms) qi of depositions for the receptors 

1 = l,...,m. We consider two decision-making problems. The first 
one is to minimize the probability of violating the ambient norms: 

n 

ma,x F(x) = P{'^tij(0)xj < gi, i = (5) 

i=i 

under the budget constraint 



n 

= ( 6 ) 

j=i 

Xj e Xj, j = (7) 

where r denotes the available resource. 

Another problem is to minimize the cost function 

G(x) = 'f^Cj(xj) (8) 

i=i 



under the risk constraint 

n 

F{x) = P{'^tij{9)xj <qi, t = 1, . . .,m} > £v; (9) 

i=i 



Xj e Xj, j = l,...,n. 



( 10 ) 
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where a is some reliability level, 0 < a < 1. 

The discreteness of Xj in these examples is provided either by the 
artificial discretion of possible emission levels or by the fact that there 
is a finite number of possible emission reduction technologies available 
at source j. 

Note that for these particular problems both the probability func- 
tion F{x) and the cost function G{x) are monotonously decreasing in 

= (xi, . ..,Xn). 

3 Assumptions 

Consider a special case of problem (l)-(3): 

max{F(a;)| a; G X n D}. (11) 

X 

We assume that this problem has feasible solutions. Denote the set of 
optimal solutions X* and the optimal value F*. 

In the branch and bound method the original problem (11) is sub- 
divided into subproblems: 

max{F(:r)| a: G X' H D}, X' C X. 

Denote F*(X'nD) the optimal value of this subproblem (by definition 
F*(0) = +oo). 

We assume that there are some lower T(X') and upper f/(X') 
bounds (set defined functions) for optimal values F*(X'nD), C X\ 
As a lower bound of F*(X' fl D) we use the value 

(i) T(X') = F{s{X')) < F*(X'n D) of the objective function at 
some point s(X') G X'hD, By definition, T(X') = H-oo if .A'flF = 0. 
Obviously, 

(ii) if X' is a singleton then F(X') = F(X'). 

Assume that the set defined function U has the following proper- 
ties. 

(iii) [/(X') > F*(X' n F) if X' n F 7 ^ 0. By definition, F(X') = 
+OC if X'nF = 0. 

(iv) If X' is a singleton then F(X') = F(X'). 

More over, we assume that bounds T(X') and U(X') are not known 
exactly and there are only some statistical estimates converging to 
them in the following limit sense. 
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(v) Assume that for each subset C X there is a probability 
space Px') a collection of random variables (stochastic 

bounds) u;'), A; = 0, 1, . . defined on this space, such 

that 



Xime{X\u') = L{X') P;f<-a.s, 

k 


(12) 


= U{X') Px - a.s.. 

k 


(13) 



For example, as stochastic lower bounds we can use the following em- 
pirical estimates: 

1=1 

where 6i are i.i.d. observations of 6. 

Methods to construct stochastic upper bounds for different classes 
of stochastic optimization problems (with expectation objective func- 
tions) are developed in [7], [8]. In Section 5 we develop such bounds 
for probability functions. 

4 The Stochastic Branch and Bound Method 

In the next algorithm for solving (11) we assume that one can relatively 
easily find a feasible point x' G X' Pi D for any simple (partition) set 
X' C X or find out that X'flD = 0. For instance, such a feasible point 
x' can easily be found for parallelepipeds X', X and the monotonous 
constraint function G{x). 

First of all we define a new probability space (fi, S, P) on which the 
algorithm works. Let us reserve some probability space (fi, 4 ,E 4 ,P 4 ) 
for stochastic components of the algorithm itself. Define 

(fi,S,P) = (Q^,Sa,P^) X Y[ (^xsSx',P) 

as the product of the probability spaces (fi^, E^, P^) and (Jlx/, E \w, Px')? 
where X' runs over all subsets of X. 

Now we consider that all quantities, appearing in the algorithm 
(random partitions Vk-, bounds ^ki Vk^ sets X^, indices Mk and 
etc.), are defined on this common probability space (fi,E,P), in par- 
ticular, for and we have 

lim^^(X',u;) = L(X') P - a.s., 

k 



( 14 ) 
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Ximri’^{X\u) = U{X') P - a.s., (15) 

k 

for all X' C X. 

For brevity we skip the argument u from all random quantities 
generated by the following stochcistic branch and bound algorithm. 

Initialization. Form initial partition Vq = {X}. Calculate the 
bounds ^0 = ^nd r/o = 77 ^° (X). Set iteration number fc = 1. 

Iterations (A; = 1, 2, . . .). At the beginning of A;-th iteration there 
is a partition Vk-i (a collection of subsets of X). For each set A"' G 
Pk-i there are estimates and %(X'). Each iteration consists 

of the following steps. 

Partitioning (branching). Select a record subset 

€ Arg max{7/fc_i(X') : X' e Vk-i) 

and an approximate solution 

/ G e Arg max{^_i(X') : X' G Vk-x]- 

If the record subset X^ is a singleton, then put := Vk-i and 
go to the Bound Estimation step. Otherwise, construct a partition 
T^^'(X^) = {X/^, 7 = 1 ,..., Uk] of the record set X^ such that A^^' = 
U . . . U X^^. Define the new full partition 

v'k^{Vk-x\x'^)ijv';^{x^). 

Bound Estimation. For all partition elements X' G select 
stochastic estimates ^^(^0 = of L{X') and ??A;(A'') = 

[X') of U{X'), where Mk(X') is a random index. 

Deletion. Clean partition Vj^ of infeasible subsets, defining 

Vk = Vl\{X' evl : X'nD = 0}. 

The End of an Iteration. Set k := k + 1 and go to Partitioning 
step. 

Remark to Deletion. If the estimates ^fc(X'), rik{X') are a.s. 
exact, i.e. if besides (12), (13) we have ^*;(X') < L{X') and i]k{X') > 
U{X') a.s., then at the Deletion step one can also delete the so-called 
nonperspective sets X' G Vl which gk{X') < msbXxi^-p’^^kiX'). 
Other stochastic deletion rules are discussed in [8]. 
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Theorem 4.1 (Norkin et al [7]) Suppose that XC[D 7 ^ 0. Let indices 
Mk{X^^uj) be chosen in such a way that 

lim Mk (X^ oj) = +00 a,s, 
k 

for any fixed X^ C X . Then with probability one there exists an itera- 
tion number K{oj) such that for all k > A (u;) 

the record sets X^{u) are singletons and X^{u) C A"*; 
the approximate solutions y^{oj) G A*; 

= limfc77it_i(X^(a;)) = F*. 

5 Optimization of Probabilities 

Consider the problems 

max JF(x) = P{f{x,e) 6 B}] = F*{X'nD), (16) 

xGA 'nv 

where A' C A, A and D have the same meaning as in (3), 9 G 0, 
(0,S,P) is some probability space, f{x^6) = {fi{x,9)^ . . fm{x,9)) 
is a random vector function, J5 is a closed subset of R^. In the next 
two subsections we give upper bounds (estimates) for F*(A' fl D). 

5.1 The Interchange Relaxation 

Let us estimate F*(A' fl D) from above by interchange of the max- 
imization and the probability operators {the interchange relaxation). 
Obviously, 

F*(A'nF) < P{3x'{9) e ^nD : f{x'{0),9) e B} = U{X') 

< p{3xie) eJP nD : f{xiefi9)eB} = u 

where A' denotes the convex hull of AL 

Let us introduce indicator functions Xa{X'){^)^ Xa(X')(^) 
events 



A(A') = {9ee\ 3x\9) G A' n F : f{x\9),e) G 5}, 
A(A') = {9eQ\ 3x\9) G A^ n F : f{x'{0),9) G F}, 



XA{X'){^) - 



1 , eeA{x'); 

0, otherwise. 






1, oeAiX'y, 

0, otherwise. 
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Then 

U{X') = Bxa(X'){0), U{X') = 

To calculate the stochastic estimate ^(X', 6) = Xa(X')(^) of ^H^') 
one has to check for a given 6 the feasibility of conditions /(:r', 0) G B, 
x' e X' nD. _ _ 

To calculate the stochastic estimate 0) = of U{X') 

one has to check for a given 0 the feasibility of conditions f(x', 6) G B, 
x'gX'HD. 

If functions fi{x^6)^ i = 1, . . . , m, are linear in a:, fi is a polyhedral 
set in and D is given by linear constraints, then the problem 
of checking feasibility of conditions /(x', 0) G B, a:' G X' D D is a 
linear integer programming problem (and linear problem for conditions 
f{x^,0)eB, x^eJPnD). 

If the set 0 is finite and feasibility is checked easily and quickly, 
then U{X') and U (X') can be calculated exactly. Otherwise we have to 
use some statistical (empirical) estimates for t^(X') and U{X'), based 
on observations of 9 and the calculation of Xa{X>){^) 

An important particular case of (16) is the problem of the opti- 
mization of the probability to exceed a given threshold: 

= P{/(a:,^) > c}], 

where f{x^ 9) is a random function of x, c is a given threshold. In this 
case 



U {X') = Ex( max f{x, 0) - c) = P{ max f(x, 0) > c}, 

x£X'nV x£X'nD 

U{X') = Ex( max f{x, 9) - c) = P{ r^x f{x, 9) > c}, 

xeX'nD xeX'nD 

where x(0 = 0 if ^ < 0 and x(0 — 1 otherwise, X' is a convex hull 
of X'. If the problem of maximizing f{x^0) over x G X^' fl D (or 
over X G X' n D) is relatively simple (linear, convex or concave on a 
polyhedral set), then the above quantities C/(X'), U{X^) are practical 
estimates of F*(X' fl jD). 
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5.2 Multiple Sampling Bounds 

One can make bounds of section 5.1 sharper by using multiple sampling 
of 9. Let 6^ = {9i,.. .,6i) be / i.i.d. observations of 6. Then 

[max^gX'nD F{x)]‘ = m&x^^x'nD F‘{x) 

- max^eX'nD 9i) e B} x ...X P{f{x, Oi) G B] 

< P{3x'{9^) G X' n D : f{x'{0‘),9i) G 5, . . . , f{x'(9‘), 9,) G B}, 

where X' is a convex hull of X', P is the appropriate probability 
measure on 0^ Thus 

Ui{X') = P^^‘{3x'{9‘) G X^nD : G B, . . . J{x'{9‘),9i) G B} 

is an upper bound for the probabilities F{x)^ x G X' fl D, satisfying 
condition (iv). Similar bounds for expectations and probabilities were 
used in [8]. 

The estimate C//(X') is relatively easily calculated in the case of 
linear functions f{x^0). 

For the probability to exceed a given threshold, the above estimate 
has the form: 

t7/(X') max min f{x,9i) > c}. 

xeX'nD i<«<' 

Advanced estimates Ui(X') are not worse then the simple estimate 
Ui{X'). Indeed, 

Ui(X') < P^/‘{3xi{9i),...,xi{9i) eX^nD: 

f{xi{9i),9i)j B,...,f{xi{9i),9i) e B} 

= [P{3xi{9i) e X> n D : f{xi{9i),9i)eB}x... 

xP{3xi{9j^eTnD: f{xi{9i),9i)eB}y^‘ 

= P{3x{9)€X>nD: f{x{9),9) e B} ^Th{X'). 

6 The Constrained Stochastic Branch and Bound 
Method 

Now we consider global optimization problem (1)- (3) with nonconvex 
stochastic constraints (2). Then we cannot easily point out feasible 
points for subproblems of this problem. Such points have to be elab- 
orated by the algorithm. In this section we develop a version of the 
branch and bound method which explicitly treats nonconvex (stochas- 
tic) constraints of the problem. 




6.1 Assumptions 

Consider the problem 
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max{F(a^)| x £ X D D H C}, 

where X and D have the same meaning as in (3), the set C is given 
by the inequality constraint: 

C = {xe R^\ H{x) < 0}. 

For instance, for constraints (2) we can put H{x) = maxi</,,<x Fk{x). 

We assume that X H F fl C 7 ^ 0. To check this one can apply the 
Stochastic Branch and Bound Method from Section 4 to the problem 

max{-F’(a:)| x £ X C\ D}. 

As before, we assume that for any X' C X one can easily find 
a point x' G X' n F or find out that X' fl F = 0. Function H{x) 
is supposed to be essentially nonlinear and stochastic. Let A"' be a 
(partition) subset of X. Consider auxiliary optimization problems 

max F(x). and min H(x), 
xeX'nD xex' 

Denote F*(X' fl F) and i?*(X') optimal values of these problems, 
respectively. For some positive e denote = {x £ R’^\ H{x) < t}. 

The Branch and Bound method uses lower and upper bounds for 
F*(X'nF), X' C X . We assume that there are set defined functions 
U and L satisfying requirements (i)-(v) of Section 3. 

The method also uses lower bounds for F*(X'), X' C A". So, we 
assume that there exist a set defined function /(X') such that: 

(vi) /(X') < F*(X'), for any X' C X; 

(vii) for any singleton X' /(X') = F(X'); 

(viii) for all subsets X' C X there are sequences of random vari- 
ables A^(X',u;), defined on some probability space (fi 
such that 

lim = 

k 



a.s. 



(17) 
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6.2 The Algorithm 

In the following algorithm we don’t assume that for any partition sub- 
set X' C X a feasible point x' G D D 0 C is known. e-Feasible 
points £ X^ n D n Ce are elaborated by the algorithm. To do this, 
stochastic estimates of the lower bounds 1{X^) of the optimal values 
of are used. Thus this algorithm can be applied to problems 

with nonlinear and nonconvex constraints such as (2), (9). To provide 
convergence of the method we relax a little (by e > 0) the constraint 
H{x) < 0. As in Section 4 we consider that the method works on the 
common probability space (fi!, S, P). For brevity we skip the argument 
(jj from all random quantities generated by the algorithm. 

Initialization. Form initial partition Vo = {A}. Calculate the 
bounds ^o(^) = MX) = M°{X) and Ao(X) = Mo 

is some initial index. Set the iteration number k = 1. 

Iterations (fc = 1, 2, . . .). At the beginning of A;-th iteration there 
is a partition Vk-i of the remaining part of X, For each set X' G Vk-i 
there are estimates Aa;_i(A'). Each iteration 

consists of the following steps. 

Partitioning (branching). If Vl_^ = {X' G Vk-i : Xk-i{^') < 
6} ^ 0 then select a record subset 

6 Arg max{%_i(X') : X' € ^-i}, 

and an approximate solution 

yMvM Arg max{6-i(X') : X' € VU}- 

If 'Pk-i = 0 or the record subset X^ is a singleton, then set := Pk-i 
and go to the Bound Estimation step to improve bounds. Otherwise, 
construct a partition — {Xf , i = 1, . . . , nk} of the record set 

X^ such that X^ = X^ U . . . U X^^. Define a new full partition 

V’k = {Vk-i\X'^)uV'i:{X^). 

Bound Estimation. For all partition elements X' G Vl, se- 
lect stochastic estimates ^k(X') — of L{X'), i]k{X') = 

r/AA(A")(x') ofU(X') and = A^^(^')(X') of /(X'), where Mk{X') 
is a random index. 

Deletion. Clean partition Vj^ of infeasible subsets, defining 

V, = V'k\{X'£V'k: X'nD = 0}. 
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The End of an Iteration. Set k k + 1 and go to Partitioning 
step. 

Remark to Partitioning. The branching strategy to partition 
the record set guarantees that is a.s. a singleton for all suffi- 
ciently large k. Indeed, due to finiteness of X after a finite number of 
iterations the partition Vk becomes stable (unchanging) and contains 
a set Xq such that 1{Xq) < 0. Then by (17) for all sufficiently large 
k XkiXl)) < € a.s. Hence record subsets are selected and are 

singletons in this stable partition. The other strategy is to partition 
the largest set from the collection 

{X' e Vk-i\^k-i{X') > ^k-i{X^),Xk-i{X') < e}. 

Such a strategy guarantees that both record sets X^ and are sin- 
gletons for all sufficiently large k. 

Remark to Deletion. If estimates Xk{X^) are a.s. exact, i.e. 
besides (17) we have A/^(A') < /(A') a.s., then one can safely delete 
from Vl infeasible sets A' such that A/j(A') > e, 6 > 0. 

Next theorem states a convergence result for the method. 



Theorem 6.1 Suppose that AflDnC 7 ^ 0. Let indices Mk{X\uj) be 
chosen in such a way that for any fixed A' C A 



\im Mk{X\u) = +00 a.s. (18) 



Then with probability one there exists an iteration number K(lj) such 
that for all k > A (c^;) 

(a) the record sets X^{u) are singletons, X^{u;) C A^ flDnCV and 



m^x F{x)<F{X^{u))< 
e\Dc\C 



max Fix), 
xeXr\DnC, 



(b) if the partition strategy guarantees that becomes a single- 
ton as k — > 00 (see Remark to Partitioning) , then the approximate 
solutions £ X f] D DC and 



max F(x) < Fly’^ioj)) < 
xeXnDnC V // - 



max F(x), 
xeXnDnCe 



Proof. Due to the finiteness of X for each co after some iteration k — 
/l'o(a;) the current partition 'Pfc(w) becomes stable (unchanging), i.e. 
Vk[^) — Poo{^)- Denote fi' the set of those u> £ Q, that (14), (1-5), (17) 
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and (18) take place for all X' C X, = 1. For a fixed a; 6 fl' and 

X' G Voo we have \imk Mk{X',oj) — +oo and hence limi;^^.(X',o;) = 
L{X'), limkVk{X\u;) = U{X'), \imkXk{X\u) = 1{X'). 

Now prove (a). Fix some u G Current partition always con- 
tains a set Xq such that 1{Xq) < 0. So for w G by (17) the set 
U{X' G Vk-i{oj) : Afe(X', u;) < e} is nonempty beginning from some 
k = A’i(w) > Ko{u). Thus for k > A'i(w) the record subset A'^'(u;) 
is selected and this record subset is a singleton, X’^{uj) = x^(io). Let 
X^"'{u>) = x°°{u) G Voo{kj) for some km > Ki{u>), m = 1,2, Ob- 
viously, x’^(w) G A n D. Since Xk{X'^{u}),u) < e for k > Ki{oj), 
then Afc^(x°°(w), w) < e for km > Ai(w) and H{x°^(u>)) < e. Thus 
x'^(a;) G A n D n Ct and 

On the other hand, by construction, rjk-i{X'^{u),u>) > rik-i{X',u!) for 
all A' G k > Ki{oj), then 

F(x°°(cj)) = lim,„%^(A*’"*(a;),a;) > lim,„7?fc„(A',u;) 

= U{X') > mAx^^x'noFix), 

for A' G T’co(w) such that 1{X') < e. Since 

XnDnCC U{A' G Foo(w) : 1{X') < e}, 



then 

F(x°°(u;)) > maxx'{maXa,ex'nDF(x) : X' G Voo{‘^), 1{X') < e} 

> maXa;eXnDnc-F(x). 

Now prove the statement (b) of the theorem. For oj beginning 
from k > Ki{oj) the set nonempty. So the approximate 

solutions Y^(u>) are selected and 

Afe(y''(o;),a;)<c. (19) 

By construction of the algorithm 

6-i(F''(u;),a;) > 6-i(A''(o;),a;), (20) 

and by (a) 

liminf^fc(A''(w),a;) > max F(x). (21) 

k xeXnDnC 
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Let for some km > Ki{uj) be a singleton and = 

y'^(u;) € Vooi'^)^ m = 1,2,.... Obviously, y°°{oj) € X HD. Then 
by conditions (vii), (viii), (21) we obtain 

H{y^(oj)) = l{y^{oj)) = lim XkjY^-{co),u;) < e 

and thus F{y°°{u})) < maXa,exnDnC^ T(a;). By (ii), (v), (20) and state- 
ment (a) we obtain 

F{y°°{oj)) = L{y°°{oj)) = liminf„j^fc„(F'=’"(u;),a;) 

> liminfm^fc^(X'=’”(<x;),u?) 

> max^^xnDnC F{x).a 

7 Numerical Experiments 

In this section we present the results of some numerical experiments 
with pollution control problems from Section 2. 

7.1 Optimization of Probabilities Subject to the Budget 
Constraint 

The following problem of dimensions n = 5, 10 was solved by the 
exhaustive enumeration and by the Stochastic Branch and Bound 
Method: 

n 

max[F(o:, 9) = P 

i=i 

subject to 

5 

<^(^) = (r = 4.0, 8.0), 

i=i 

^ [^5 j 1, . . . , TT, 

where 

Cj{xj) = niax {hjk - ajkXj), j = 1 , . . . , n, 

the random variable 9 takes on integer values 1, 2 , . . ., 10 with proba- 
bilities pg, probabilities pg firstly are randomly chosen from [0,1] then 
are normalized to satisfy Ylg=iP0 = Ij coefficients tj{9), ajk are 
randomly chosen from the interval [0, 1], y = 0.5. 

The Stochastic Branch and Bound Method of Section 4 was imple- 
mented by means of Microsoft (R) FORTRAN Optimizing Compiler 
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5.00. The computations were carried out on IBM PC AT 386 DX40 
(with i387 co-processor). 

Note that due to the finite number of scenarios 9 one can calculate 
values of F{x) and bounds L, U exactly. For the subset 

X' = {x e R^\ c < X < d (componentwise)}, c, d G 

we take 

U(x') = I 

^ 1 — oo, otherwise; 

L(X') = [ + ifG'(rf)<r, 

^ 1 — OO, otherwise, 

where C is the maximal t such that G((l — t)c + td) < r. 

Firstly, this problem was solved by the exhaustive examination of 
0.1-step lattices (for n == 5, r = 4.0) and 0.2-step lattice (for n = 10, 
r = 8.0). It took 1 mm 23.43 ^ecand 3 hrbS mm 35.89 sec, respectively. 

Then the problem (with n = 5, r = 4.0) was solved by the Stochas- 
tic Branch and Bound Method over the 0.1-step and 0.01-step lattices. 
It took 5.49 sec (167 branching iterations, maximum 52 subproblems 
in the list) for the 0.1-lattice and 45.80 sec (1279 branching iterations, 
maximum 389 subproblems in the list) for the 0.01-lattice. 

In one more experiment the problem (with n — 10, r = 8.0) was 
solved by the Branch and Bound Method over 0.2-step lattice. It took 
3 min 27.23 sec (3641 branching iterations, maximum 633 subproblems 
in the list) that is approximately 68 times faster than the exhaustive 
enumeration. 

7.2 Cost Minimization Subject to the Reliability Con- 
straint 

The following problem was solved over 0.1-step lattice by the Stochas- 
tic Branch and Bound Algorithm of section 6.2: 

minG(x), 

X 

subject to 

F{x,2.5) > 0.999, 

e [0,1], i = i,...,io, 

where F{x, q) and G{x) are the same as in section 7.1. 

To achieve accuracy 1% (in function) it took 53 sec (1144 branching 
iterations, 569 subproblems in the queue). 
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Abstract. The problem of checking robust stability of interval matrices has 
been proved to be NP-h 2 ird. However a closely related problem can be effec- 
tively solved in the framework of the stochastic approach [1]. Moreover the 
deterministic interval robust stability radius happens to be very conservative 
for leirge dimensions from the probabilistic point of view. 

Keywords. Robustness, stability, interval matrices, random matrices, prob- 
abilistic approach. 

1 Introduction 

The first attempts to check robust stability of interval matrices (i.e. to check 
that all matrices of the form A < A < A a.ie Hurwitz; the inequalities 
are understood in component- wise sense) have been performed just after the 
publication of the famous Kharitonov theorem on robust stability of interval 
polynomials. These attempts happened to be unsuccessful - the equivalent 
of the Kharitonov theorem does not hold for interval matrices [2]. Moreover, 
checking robust stability of edges and other low dimensional faces of the box 
of entries can not guarantee stability of the entire family of matrices [3]. The 
final point in this series of “negative” results was the proof of NP-hardness 
of the problem [4]. Meanwhile there exist numerous sufficient conditions for 
robust stability of interval matrices, most of them being very conservative. 
The references as well as recent computational results for low-dimensional 
matrices can be found in [5]. 

Our aim is to demonstrate that passing from the deterministic to the 
stochastic point of view can change the situation drastically. The general 
idea of such approach to robustness can be found in [1]; here we demonstrate 
the tecniques of the approach applied to robust stability of interval matrices. 
It is based on two fundametal results. The first one relates to probability 
theory — it is Geman’s theorem on the norm of random matrices [6]. The 
second one is the famous recent formula for computation of the real stability 
radius due to Qiu a.o. [7]. 

In Section 2 we provide Geman’s result and its nonasymptotic extension. 
Section 3 contains the main formula for calculation of stochastic interval 
stability radius. The examples are given in Section 4, they demonstrate that 
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the stochastic approach allows to extend significantly the stability margin 
if compared with the standard deterministic characteristics. We discuss the 
results and the directions for future research in Section 5. 

2 Norms of random matrices 

Let a,j,i = = 1,... be independent identically distributed (i.i.d.) 

real random variables with 

Ea = 0, Ea^ = Ea^ < oc, 

here a stands for any aij. Consider a random matrix An with entries aij^i = 
1, n, j = 1, n. Denote 

Vn = {ll(Ty/n)\\An\\, 

where the norm is understood as operator one: ||j 4|| = sup||^H_i ||j4a:||, ||a:|p = 

Fact l.For n oc the norms of random matrices converge to 2 with 
probability 1: 

2 a.s. 

This result is mainly due to Geman [6]; the superfluous condition on the 
existence of all moments of entries in [6] has been relaxed in [8]. 

The behavior of the eigenvalues of random matrices is also known. 

Fact 2 [9], [10]. The distribution of the eigenvalues of the matrix Bn = 
(l/ay/n)An tends to uniform on the unit disk as n-^ oo. 

Now suppose that the entries of An are uniformly distributed on [—7, 7] 
(denote it as a G -R[“7?7])- Then <r = 7/\/3. For this case Monte-Carlo 
simulation provides the estimates for the rate of convergence in Geman ’s 
theorem: 

Evn^2- Var(t;„) 0.22n-^‘^l 

Moreover the following nonasymptotic result is valid. 

Proposition 1. The probability for the norm of a random matrix to exceed 
the given level is 

P{vn > 2.1) < 0.01, 

P{vn > 2.2) < 0.002, 

uniformly over all n. 

3 Robust stability of interval matrices 

Assume Af is a real n x n stable matrix, that is all its eigenvalues have negative 
real parts. Its real stability radius is defined as 

r/j(M) == inf{7 : Af -f A is unstable, 1|A|| < 7}. 
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The algorithm for computation of rR{M) has been proposed in [7]. The 
interval stability radius of M is 

r/(Af) = inf{7 : M Ais unstable, |a,j| < 7, Vi, j}. 

It is not hard to get the estimate nrj{M) < r/j(M), which is tight for some 
matrices. However in general this estimate is too conservative. 

Now let us introduce the stochastic stability radius rs{M). Fix some 
confidence level 0 < a < 1, e.g. a = 0.99. Then 

rs{M) = inf {7 : P{M + ^4 is unstable} > 1 ~ a, aij G R[-'y^ j]}- 

Here it is assumed that aij are i.i.d. entries of A. We choose the uniform 
distribution for them; there is a strong reasoning that such distribution is in 
a sense the worst one [11]. The main result of the paper is the immediate 
corollary of Proposition 1 and the above definition. 

Proposition 2. For confidence level 0,99 and all n 

rs{M) > Pi = {ci/y/n)rR{M), c± = V3/2.1 = 0.825 . . . , 
and for confidence level 0,998 and all n 

rs{M) > P2 = ^2 = V3/2.2 = 0.787 — 

Thus to calculate the estimate for the stochcistic stability radius it suffices 
to find the real stability radius of a matrix via the algorithm from [7]. After 
calculation of pi , p2 we can guarantee that if the perturbations of all entries 
of M do not exceed pi(p2) then with probability not less than 0.99 (0.998) 
the stability is preserved. 

4 Examples 

Example 1. Consider the negative unit matrix as M : Af = — Then both 
stability radii can be computed explicitly: rij(M) = l,r/(M) = 1/n, where 
n is the dimension of M, For n = 40 the Proposition 2 yields rs{M) > 
Pi = 0.13, r/(M) = 0.025. Moreover rs{M) can be estimated more precisely 
for this example. Fact 2 provides that the eigenvalues oi M + A — —I A 
(where A is a random matrix with entries uniformly distributed on[— 7, 7]) are 
approximately unifo rmly distributed in the disk centered at —1 and having 
the radius 7-y/40/3. Hence real parts of the eigenvalues are negative with 
large probability iff 7 < \/3/40 = 0.274 and rs (Af) 9 :^ 0.274. The direct 
Monte-Carlo simulation provides rs (Af) = 0.25 with confidence level 0.997. 
Thus the true interval stability radius is approximately 10 times conservative 
versus its stochastic counterpart! In other words, if we are ready to tolerate 
instability with probability 0.003, then we can extend the margin for interval 
perturbations 10 times for this example. 
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Example 2. The case of Metzlerian matrices (i.e. the matrices M with 
entries < 0,m,j > 0, i ^ j) gives the opportunity to calculate stability 
radii explicitly [12], [13], [14]. Consider the example: 

n = 40, mu = -60, mij = 2, i < j, mij = l,i> j, i, j = 1, . . .40. 

The eigenvalues of M can be calculated analytically: 

\k = -60 + (rcfc - 2 )/(l - T€k) 

where r = *-^,€k are values of Hence m^lxSi(A)fe) = —3.79 and 

M is stable. The formula for real stability radius of Metzlerian matrices 
is equivalent to rjj(M) = cri{M) = 3.535, cri{M) being the least singular 
value of M, For interval Metzlerian matrices the violation of stability is 
met first when all values of perturbations are positive and have the largest 
values, i.e. a matrix A with entries |a,j| < 7 can be destabilizing if aij = 
7. This yields rj(M) = 0.0913 for M as above. We get pi = 0.484, and 
Monte-Carlo calculation gives rs{M) = 2 with confidence level 0.995. Again 
the ratio rs{M)/ri{M) is large, demonstrating the conservativeness of the 
deterministic approach. 

Example 3. A Hurwitz polynomial of degree n with stable zeros Wcis 
randomly generated, then a corresponding companion form matrix B was 
constructed and it was taken M = C~^BC^ where C was a random nonsin- 
gular matrix.The calculations via the algorithm of [7] provided and 

the formulae of Proposition 2 were used to calculate Then random 

matrices A with i.i.d. entries uniformly distributed on [— p, p] were generated 
and the stability of the perturbed matrices M -f A was checked. The calcu- 
lations confirmed that the probability of instability was generally less than 
0.001 even for p = pi. Incidentally it Wcis found that serious computational 
problems arise for the algorithm of [7] for moderate dimensions (n 20). 

5 Discussion 

The results of the computation confirm the fact that the stochastic approach 
allows to extend significantly the deterministic interval stability radius. 

The proposed formula provides a lower bound for rs{M); the true value of 
r 5 (Af) may be much greater (compare the results in the examples 1,2). It is 
explainable: we guarantee (with the given confidence level) that the norm of 
perturbation does not exceed the real stability radius, meanwhile if the norm 
is larger, it does not imply instability necesserely. Thus one of the directions 
for future research is to develop a new version of stochastic approach, not 
relying on the formula for the real stability radius, which provides tighter 
bounds for the stochastic stability radius. 

Another limitation of the proposed tecnique is the assumption on equality 
of ranges of perturbations (the bounding 7 is the same for all entries of A). 
It is an open problem to get rid of this assumption. 
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Abstract: Preprocessing in two-stage stochastic programming is con- 
sidered from the viewpoint of Fourier- Motzkin elimination. Although of expo- 
nential complexity in general, Fourier-Motzkin elimination is shown to provide 
valuable insights into specific topics such as solving integer recourse stochastic 
programs or verifying stability conditions. Test runs with the computer code 
PORTA [5] are reported. 
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1 Introduction 

Two-stage stochastic programs arise as deterministic equivalents to random 
optimization problems that are characterized by a two-stage scheme of altern- 
ate decision and observation. First, a here-and-now decision has to be taken 
without knowing the outcomes of random problem data. After realization of 
the random data, a second-stage (recourse) decision is possible, which is the 
solution of a subordinate optimization problem depending on the first-stage 
solution and the outcome of the random data. In this paper, we consider 
problems where the second-stage is a linear program with possibly integer 
requirements on the variables. In two-stage stochastic programming one en- 
counters an interplay of algebraic and probabilistic difficulties. Preprocessing 
in stochastic programming is directed to analyzing the underlying algebraic 
structures. This may be helpful for supporting solution procedures or im- 
proving structural understanding of the problem. 

In this note, we consider stochastic programs of the following form 

mm{c^ X Q{x) : x £ C} (1) 
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where 

Q[x)- f ^{z - Ax)fi{dz) (2) 

Jn^ 

and 

^{t) — min{q^y : Wy >t, y £Y} with Y = TZ^ or Y = . (3) 

Here, C C 7^” is a non-empty polyhedron and c,q,W are vectors and matrices 
of proper dimensions. The above mentioned scheme of alternate decision and 
observation is reflected by the first-stage variables x, the random vector z 
with underlying probability measure y, and the second-stage variables y. The 
two-stage stochastic program aims at finding a first-stage decision x such that 
the sum of direct costs c^x and expected recourse costs Q{x) is minimal. For 
further reading on basics in two-stage stochastic programming we refer to 
[8, 9]. Let us asssume that Y =11"^ and impose the following assumptions 



the matrix W has full rank, (4) 

Md = {u£TZX : W^u < g} ^ 0, (5) 

/ ||z||//(dz) < - 1 - 00 . (6) 

These assumptions imply that Mo has vertices, and according to the decom- 
position theorem for polyhedra it admits a representation 

Mo = conv{Vo) -h cone{Vi) (7) 

where Vb, Vi are finite sets of vectors in TZ^ and conv and cone denote the 



convex and conical hulls, respectively. Preprocessing, as discussed in this 
note, concerns the algorithmic transformation of the representation in (5) 
into the one of (7). To see that explicit knowledge of Vo, Vi can be beneficial, 
recall the following identities 
cone{Vi) = : W'^u < 0} 

(since cone{Vi) is the recession cone of Mo) 

= {ueTl" : u^Wi <0,i=l,...,m, {-ej) < 0, j = 1, . . . , s} 

(wi are the columns of W and ej the canonical unit vectors) 

= {uEH" : + ^Aj(-e_,)) < 0 VA,- > > 0} 

» 3 

— {u£ll‘ : u^w < 0 Vit) G pos{W, -/)} 

= (pos{w,-i)r 

where pos denotes the positive span and * indicates the polar cone. There- 
fore, the elements of Vi are the coefficients in an inequality description of 
pos{W,-I). In com- 

putations, the probability measure p is in general assumed to be discrete, 
i. e., with mass points and probabilities pi,...,pL- Then, Q(a;) is 

well defined if zi - Ax € pos{W, -/) for all / = 1, . . . , L. In the literature, the 
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latter are called induced constraints. These are implicit conditions that are 
made explicit if V\ is known. Moreover, for t G pos{Wj —I) it holds that 

^(t) = ma.xvjt. (8) 

Vi£Vo * ^ ^ 

such that computation of ^ becomes easy if Vo is known, and the domain of 
definition of ^ becomes explicit if Vi is known. 

Nevertheless, the above considerations have only limited impact on solution 
procedures for state-of-the-art linear recourse problems (i.e., if Y = TZ^) , 
since far too many elements arise in Vq and V\. Methods like L-shaped, reg- 
ularized or stochastic decomposition [17, 11, 7], for instance, generate only 
those elements of Vo and V\ that are relevant for the solution process. On the 
other hand, the above transformation may be useful to support solution pro- 
cedures for smaller problems of more complicated nature (e.g., if Y = or 
for answering questions in the theory of stochastic programming (e.g., verify- 
ing stability conditions) . The emphasis of our paper is on these two specific 
issues. A more general view on preprocessing including its impact on model- 
ing is adopted in [18, 19] and Chapter 5 in [8]. 

In Section 2, we recall the role of Fourier-Motzkin elimination when trans- 
forming an inequality description of a polyhedron into the representation 
as Minkowski sum of a convex and a conical hull. In Section 3, we show 
how preprocessing via Fourier-Motzkin elimination enters into an algorithm 
for integer recourse stochastic programs. Section 4 deals with the verifica- 
tion of stability conditions for stochastic programs with linear recourse. Here 
Fourier-Motzkin elimination can be beneficial in generating information about 
the polyhedral complex of lineality regions of the second-stage value function 

Finally, we have a conclusions section. 

2 Theoretical Background 

In this section we recall the essence of Fourier-Motzkin elimination and put 
it into the context of computing extreme points and extreme rays of the poly- 
hedron Md (cf. (5)). Enumeration of extreme points and extreme rays of 
a polyhedron is a well studied problem in the literature (see, e.g., [21] and 
the references therein). In connection with stochastic programming, an ex- 
cellent account is given in [18]. Our intention here is to show the relation 
to Fourier-Motzkin elimination, which is interesting from the practical com- 
putations point of view, since there exists a freely available implementation 
([5]) of the transformation procedure described below. This supplements the 
implementations reported in [18]. 

For a polyhedron H = {x G : Ax < 6}, Fourier - Motzkin elimination 
provides an algorithmic way for projection along the Ar— th coordinate (1 < 
Ar < n), i.e., for eliminating the variable Xk from the system Ax < b. Let 
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ai,bi{i = 1, . . be the rows of A and components of 6, respectively, and 
let a, 7 c denote the fc— th component of ai. We introduce subsets /> ,/<,/= of 
such that 

Gik > 0 for all i £ 

a, 7c < 0 for all i E 

== 0 forall 2 E/=. 

Simple manipulations then provide for any ar G II 
Xk < —{aikXk — aj X A- bi) for all i G /> 

dik 

and 

Xk > [—dikXk A df X — bi) for all i G /< • 

^ik 

The right-hand sides in these inequalities are independent of Xk such that we 
obtain for the projection II(/c) of II along the k—th. coordinate 

n^k) = { X(fc) G : 

max{ ^ {-aikXk + aj X - bi)} < min{ — (a.feXfc - af x + 6,)}, 

iE/> — dik *€/< dik 

o>J X < bi for all i G /= | 

The first inequality in the above description of II(/c) can be equivalently 
expressed by |/>| • |/<| many linear inequalities, where |.| denotes cardinality. 
When projecting further down to smaller dimensions, the above scheme has 
to be iterated, which in general produces inequality systems of enormous 
size. This prevents algorithmic use of the method for large-scale problems. 
Strategies going back to Tschernikow ([16], see also [6]) allow to produce 
a description of II (/j) without redundant inequalities such that in practical 
computations intermediate inequality systems can be kept as small as possible. 

Proposition 1. Let Md = {w G : W^u < q} be such thdt 0 G Md, 
then d representdtion Md = conv{Vo) + cone{Vi) cdn be computed by using 
the dbove elimindtion procedure. 

Proof. We consider the polar polyhedron j which is given as follows 

M*d = <1 Vu G Md}. 

Then it holds (cf. [12], Theorem 9.1) that is a polyhedron again and that 
Mq* = Md- Let Md be written as 

MD = {«en : {W,-Ifu< Q} 

where W is scaled in such a way that g,- G {0, 1} for all components qt of 
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q, (z = 1, . . . , m). The latter is possible since 0 E Md implies q > 0. Another 
standard result on polarity (cf. again [12], Theorem 9.1) now yields that 

m 

= conv^jO} U [J + cone ( U LJ{-ej}) 

9i=i gt=o j=i 

where wi and Cj are as in Section 1. Therefore 

M*D = \ueTl‘ : 3Xe 3//^ G G 

u = '^\iWi + '^nlwk + '^ti'j{-ej), 
i k j 

i 

Hence 

M}) == 

where 

M*d - |(u,A,/x\/x^) : u -'^\iWi + Y^nlwk + '^n]i-ej), 

i k j 

^Ai = 1,A > > 0,/z^ > o| (9) 

i 

and A^£)(a /i 2 ) denotes the projection of along Elimin- 
ating the variables from the above system by the Fourier- Motzkin 

procedure yields an inequality description 

= {u : Hqu < ft, Hiu = 0} 

which has no redundant rows if we apply the above mentioned Tschernikow 
rules. Since also 0 G we may assume that, after proper scaling, ftf G 
{0, 1} for all components ft,- of ft. Now, Mjy = M^*, and again Theorem 9.1 
in [12] implies 

Md = {M*oY = cont;({0}U |J {/*oi}) 

/i, = l 

-\-cone ( U {hoi}yj[J{hu}u[J{-hu}) 

hi=0 i i 

where ftonftij are the rows of This is the desired representation and 

our proof is complete. 

Recall that, due to our basic assumptions (4), (5), the polyhedron Md has 
vertices. Therefore Hi in the above proof has to be the zero matrix. 

The main algorithmic step in the above proof, the elimination of the variables 
A,/i^,//^ from the system in (9), is implemented in the code PORTA [5]. 
As input, the user has to supply an inequality description of the relevant 
polyhedron, in our situation {W, —I)^u < (q), u > 0. The output of PORTA 




213 



contains the list of vertices and extreme rays, in our situation the row vectors 
of Ho. 

In [18] the authors report on numerical experience with the algorithm 
support ([20]) that, although different in appearance, follows similar principles 
as the Fourier-Motzkin procedure. Our intention with the above proposition 
is to point out the relation to Fourier-Motzkin elimination and to give an 
impression on the key procedure implemented in PORTA. 

The bottleneck of Fourier-Motzkin elimination, however, is that, in gen- 
eral, the size of the iteration system of linear inequalities grows quadratically 
per elimination of one variable. Moreover, Fourier-Motzkin elimination is an 
all-or-nothing procedure. The complete list of vertices and extreme rays is 
generated only in the very last step. If the method breaks down because the 
iteration system of linear inequalities becomes too big, then no partial list of 
vertices and extreme rays is available. 



3 Lower Bounds for Integer Recourse Problems 

The present section deals with two-stage stochastic programs where the 
second stage problem is an integer linear program. The basic model is again 
given by (1) - (3) but now Y = . We also assume that all entries in W 

are rational numbers, and (3) reads 

$(<) = : Wy>t,ye 2 "}. ( 10 ) 

This value function $ is in general non-convex and discontinuous, in fact, lower 
semicontinuous, and these properties of <F, obviously, are transferred to Q. 
Therefore, algorithms for linear recourse problems with continuous variables, 
essentially resting on convexity of Q, break down for this class of problems. If 
the underlying probability measure p is discrete, which we assume also in the 
present section, then (1) - (3) may be equivalently rewritten as a large-scale 
mixed-integer linear program with dual block angular structure. Tackling this 
problem by primal decomposition leads to master problems whose objectives 
are essentially governed by and, again, we are facing lower semicontinuous 
objectives ([2]). 

In [15], an algorithm for the above integer recourse stochastic program is 
proposed that combines enumeration of Q with an efficient procedure for 
computing its function values. The latter employs Grobner bases methods 
from computational algebra: Using Buchberger’s algorithm, a Grobner basis 
of a polynomial ideal related to the integer program in (10) is computed. This 
basis only depends on the objective and the coefficient matrix of the integer 
program. For the various right-hand sides, solution of the integer programs 
then is accomplished by a scheme of generalized division of multivariate poly- 
nomials. The latter is much faster than solving anew the integer program 
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with conventional methods each time another right-hand side arises. Com- 
puting the Grobner basis, however, is the bottleneck of the method such that 
second-stage problems with moderate size can be handled only. For details of 
Grobner bases theory and its application to integer programming we refer to 
[1,3,4]. 

As to the enumeration of Q, bounds restricting the search are most welcome. 
Here, preprocessing does an important job which we will explain now. Con- 
sider the continuous relaxation 

= min{q^y : Wy >t, y € 

and the corresponding relaxed expected recourse function 
Qr{x)= / ^r{z - Ax)fi{dz). 

Of course, Qr{x) < Q{x)^ and therefore any optimal solution to the integer 
recourse stochastic program belongs to the level set 

{xeC : c^xAQr{x)<c^x-]-Q{x)} ( 11 ) 

where x G C is an arbitrary feasible point. The enumeration part of the 
algorithm in [15] rests on searching the above level sets. Each time a feasible 
point X with improved objective function value is found the level set (11) can 
be shrunk. 

In view of the representation (8), the function Qr is convex piecewise-linear 
if the measure ji is discrete. Therefore, all the level sets in (11) are non-empty 
polyhedra. If we assume that pos(W^ —I) = then ^/?(^) = max^;,^Vo vft 
where Vq is the vertex set of Mr • This leads to the following lower bound for 
Qr 

L 

Qr{^) = ^Pi^r{zi - Ax) 

Izzl 

L 

> ^r(^Pizi-Ax) 

/=1 

L 

- max vf C^pizi - Ax) =: Qrl{x). 

vieVo 

Here, the second line is a consequence of Jensen’s inequality. If we replace 
Qr in (11) by Qrl then again any optimal solution to the integer recourse 
stochastic program belongs to the respective level set. The advantage over 
the previous situation is that Qrl is explicitly known via the vertices of Mr . 
If the latter are obtained by the procedure described in the previous section, 
then the enumeration part of the algorithm in [15] becomes algorithmically 
feasible. The disadvantage that Fourier-Motzkin elimination breaks down at 
large-scale instances is not of great significance in this case, since the above 
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mentioned bottleneck in Grobner basis computation restricts application of 
the algorithm in [15] to second-stage problems of moderate size. 

As an example let us consider the following model that is used as a test 
example in [15] 



max{c^x + Q(x) : x E C} 


(12) 


where 




Q[x) = 1 ^{z — Ax)p{dz) 


(13) 






and 




$(<) = max{q'^y : Wy <t, y€ {0, 1}"*} 


(14) 



with non-negative components in c, g and A, W. 

Although purely academical, the above model can be given the following in- 
terpretation as a two-stage knapsack problem with random budget where 
decision variables (boolean and continuous ones) correspond to investments. 
The first-stage investment decision a? in (12) - (14) is selected from some feas- 
ible set C and yields an immediate revenue c^x. Further revenue is gained 
from projects for which investment is done in the second stage after having 
observed the random vector z leading to the budget z — Ax. Spending 
money in the first stage decreases possibilities in the second stage. However, 
negative entries in x may be permitted leading to the possibility to contract 
loans in the first stage to enlarge possibilities in the second stage. The ob- 
jective in (12) - (14) is to find a first-stage investment decision x such that 
the sum of direct revenue from the first stage and expected revenue from the 
second stage is maximal. 

Computational experience with solving (12) - (14) is reported in [15]. Here, 
we concentrate on the preprocessing part, i.e., on finding a representation (7) 
for the dual polyhedron 

:{W'^ ,I)u>q). (15) 

Although the assumption that pos{W, I) = TZ^ is not met here, such that ^ is 
not defined on the whole of TZ^ , vertices of Md, nevertheless, can be used in 
the above way to bound enumeration. 

Our experience with PORTA on a SPARCstation 20 Model 61 with 160 
MB of main memory indicates that, within seconds, vertex sets with up to 
several hundreds of elements can be enumerated. If there are several hundred 
thousands of vertices, then there is a pretty high chance that PORTA breaks 
down due to excessive size of the iteration system. Vertex sets with up to one 
hundred thousand elements have a good chance to be enumerable, although 
this might cost several hours of CPU time. To illustrate these statements, 
we tested PORTA on some instances of the polyhedron (15). For a matrix 
W with 2 rows and 29 columns the complete list of 241 vertices was found 
after 2 seconds, for a 2 x 50 matrix W we ended up with 545 vertices after 14 
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seconds and for a 4 x 50 matrix W it took 5550 seconds to find all the 37887 
vertices. 



4 Verification of Stability Conditions 

The purpose of this section is to demonstrate how Fourier-Motzkin elimination 
can be used to verify assumptions in theoretical considerations on stochastic 
programs. 

In recent years, studies on the stability of the stochastic program (1) - 
(3) with respect to perturbations of the underlying measure // have attracted 
some interest. This is mainly motivated by the incomplete information on // 
that one often faces in applications and by numerical problems that arise in 
computations of the integral in (2) if fi is multivariate continuous. A crucial 
issue in this analysis is that sufficient stability conditions are verifiable from 
the data in the unperturbed problem, i.e., in our situation, from (1) - (3) 
with some fixed measure fi. In the following, we illustrate at a result on 
the stability of optimal solution sets how Fourier-Motzkin elimination can be 
employed to extend verifiability of stability conditions. 

Let Y = and consider problem (1) - (3) as a parametric program in // 

P(//) min{c^x -f \ x^C} (16) 

where 

- Ax)fi{dz) (17) 

and 

$(<) = min{g^y : Wy y eTZ’^}. (18) 

(The only reason for writing the second-stage linear program in equality form 
is consistency with the settings in [10], [13].) 

The following proposition was established in [10]. It provides a Lipschitz 
estimate for the Hausdorff distance of optimal solution sets to 

stochastic programs P(/i) and P{i'), respectively. The estimate is in terms 
of some distance of probability measures d(//, u; U) that we will not explain 
here. For details we refer to [10] where it is also shown that d{fi,u',U) can 
be majorized by the uniform distance of distribution functions of probability 
measures closely related to and u. The function arising in the statement 
is defined by Q/<(x) = Jn. 

Proposition 2. Let pos W = TZ" ,Md / 0 cmd ||z||;/(d 2 ) < +oo. Sup- 
pose further that is non-empty and bounded. Assume that there exists 
a convex open subset V oflZ^ such that A{'4)[p)) C V and the function is 
strongly convex on V. Let U =: cl Uq, where Uo is an open, convex, bounded 
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set such that tp{ii) C Uo and A{U) C V . 

Then there exist constants L > 0, <5 > 0 such that 

< L ■ d{fi, u; U) 

for all probability measures v such that ||z||z/((fz) < +oo and d{p, o; U) < 
S. 

As to verification of assumptions, the critical part of the above statement 
is the strong convexity of which means that there exists some k > 0 such 
that for all x>x' and all A G [0, 1] 

Om(Ax + (1-A)xO < AQ^(x) + (1-A)Q^(x')-«A(1-A)||x-xT- 

Strong convexity of can be verified via the following result from [13], 



Proposition 3. Let pos W — TJ’, the interior of Mj) be non-empty and 
< + 00 . Suppose further that there exist a convex open set 
V CTl" , constants r > 0,p> 0 as well as a density 9 of p such that 9{t') > r 
for all t' £ Vp := {t' G : dist (<', V) < p}. Then Qp is strongly convex on 
V. 

The density assumption in the above proposition restricts application of 
the result to stochastic programs (16) - (18) where all components of z are 
random, i.e., not constant almost surely. This is quite restrictive in applica- 
tions, and [14] analyses models where only a part of z € say zi 
is random, and the remaining part Z 2 £ (si -|- S 2 = s) is fixed. Then Qp 
becomes 

Qfi{^)= ^°{zi - AiX,Z 2 - A2x)p(dzi) 

JTin 

with 



$°(ti,t 2 ) = minjg^y : Wiy = ti,W 2 y = t 2 y £ (19) 

Consider 

Q^(X1,X2)= / -Xli^2 -X2)/^(c?^l) 

which, for fixed % 2 , is studied in [14] as a function of the first argument xi 

QAxi)= / -Xu^ 2 - X 2 )fJ^{dzi). ( 20 ) 

The key result in [14] is a sufficient condition for strong convexity of the 
above function . Again investigation of strong convexity is motivated by 
stability considerations, now for the more general situation of partially random 
right-hand side. 

To be able to state the mentioned result from [14] some preparation is 
needed. The dual polyhedron (cf.(5)) belonging to the linear program in 
(19) reads 

Mh = {{ui,U2)en’'+^^ : W^Ui-fWlu2<q}. 
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We will impose an assumption on where tti denotes the projection 

from TZ^ to IZ^ ^ . Furthermore, the function from (19) induces a piecewise 
linear function ^(^i) = — X 2 )* The lineality regions of $ form a 

polyhedral complex in . We will consider all members !F*j of this complex 
that arise as unbounded facets, i.e. unbounded members of dimension s\ — 1. 

The counterpart to Proposition 3 for the generalized function from 
(20) then reads ([14], Theorem 3.3): 

Proposition 4. Assume that the matrix ({^^) has full rank and that for each 
t\ E TZ^^ there exists a y G TZ^ such that W\y = and W 2 y — Z 2 — X 2 * 
Let the interior of TTiMf^ be non-empty and ||' 2 ^i||a*(^' 2 :i) < +oo. Suppose 
that there exist a convex open set V C 1Z^^, constants r > 0,p > 0, points 
e*j E T*j and a density 6 of p such that 6{t') > r for all t' E 
Then is strongly convex on V. 

Concerning verification the above generalized density assumption is non- 
trivial. It says that in each unbounded facet we have to find some point e*j 
around which the density is uniformly bounded below by a positive constant. 
In Proposition 3 this is less involved due to the simplicity of the polyhedral 
complex of lineality regions. There, lineality regions of ^ coincide with the 
outer normal cones to the dual polyhedron M/), which is compact in this 
case such that the polyhedral complex is a fan of cones with common vertex 
zero. Therefore, the points e*j can all be selected as zero and the condition 
^ turns into t' £Vp. 

To see how the complex looks like in the more general situation we recall 
that, by duality, 

4>(ti) = max{^fui + tJt/2 : (ui, U 2 ) E M|>}. (21) 

where {2 = ^2 ~ X 2 - Suppose that has vertices, and let di, . . .,d/sr be 
the vertices of that arise as optimal ones in (21) when ti varies in TZ^^ . 
Denoting by dn,di 2 {i = 1,...,AT) the projections of d{ on TZ^^ and 7Z^^, 
respectively, we obtain 

$(<l) = +t2di-2 

for all E TZ^^ such that (^ 1 ,^ 2 ) belongs to the outer normal cone ICi to Mp 
at the vertex d,- . The lineality regions 1C* of $ then are given by 

/C; = 7^l(/Cin{7^^^ x{f2}}), iE AT}. (22) 

They form a polyhedral complex that arises by intersecting the fan of outer 
normal cones to with the affine subspace TZ^^ x {^ 2 }- The sets T*j are 
the unbounded facets in this complex. 

Some insight into the polyhedral complex in (22) can be gained by using 
PORTA [5]. Let us illustrate this at the following example treated in more 
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detail in [14]. 



Example 1 . In (19), (20) we put 

= (21,21,21,21,7,7,3,3,1,0)6 71'°, 



Wi 



-3 -3 -3 3 -1 1 0 0 0 0 

-5 1 2-2-22-1100 



G Z,(77'°,7J2), 



W"2 = ( 12 12 9 9 3 3 0 0 1 -1 ) G L(77'°,77'), 

^2 = 1, X2 = 0. 

Using PORTA [5] , we computed the vertices of together with a vertex- 

inequality incidence table displaying the binding inequalities for each vertex. 
Gradients of the binding (linear) inequalities then generate the respective outer 
normal cones: 

di = (-7,0,0), 

ICi = cone {(0, 0, -1), (-1, -2, 3), (-3, 2, 9), (-3, 1, 12), (-3, -5, 12)}, 
d 2 = (-l,-3,0), 

IC 2 = cone {(0,0,-l),(0, -1,0), (-1,-2, 3)}, 
da — (5, —3, 0), 

IC 3 = cone {(0,0,-l),(0, -1,0), (3, -2, 9)}, 

J4 = (7,0,0), 

/C4 = cone {(0, 0, -1), (1,2, 3), (3, -2, 9)}, 
ds = (1,3,0), 

Ka = cone {(0, 0,-1), (1, 2, 3), (0, 1, 0)}, 
de = (-5,3,0), 

Ke = cone {(0, 0, -1), (0, 1, 0), (-3, 2, 9)}, 
d'7 = (-3,0,1), 

Kj = cone {(0, 0, 1), (-3, 1, 12), (-3, -5, 12)}, 
ds = (2, -3,1), 

Ka = cone {(0, 0, 1), (0, -1, 0), (-1, -2, 3), (3, -2, 9), (-3, -5, 12)}, 
d~9 = (4,0,1), 

AC9 = cone {(0,0,1), (1,2, 3), (3, -2, 9)}, 
dio — (~2, 3, 1), 

ACio = cone {(0,0,1),(1, 2, 3),(0,1,0), (-3, 2,9),(-3, 1,12)}. 
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One possibility to fulfill the density assumption from Proposition 4 is to 
select e*j as vertices of unbounded facets . To avoid further algorithmic 
effort for distinguishing between bounded and unbounded members of the 
lineality complex one could include all vertices of the complex into the list 
of points e*j. Then the above generators can be helpful since they allow 
to compute all vertices of the complex (22). Indeed, each intersection of a 
positive multiple of some generator with the affine subspace x {1} yields 
a vertex and each vertex has to arise as such an intersection. Hence, there 
is a one-to-one correspondence between the vertices and the generators with 
positive third component. In this way, we obtain a list of 7 vertices of which 
3 do not belong to unbounded facets. For more complicated examples, the 
latter extraction may be non-trivial. In such cases the generalized density 
assumption can be fulfilled by using all vertices of (22) instead of the points 

Another possibility to fulfill the generalized density assumption is to claim 
that e{t') > r for all f eB+Vp where 0 is a bounded set containing all the 
vertices from (22). To this end, we compute an upper bound for the norm of 
these vertices. Again preprocessing is helpful. The outer normal cone ICi to 
Mp at di can be written as 

ICi = {u ell" ■■ U= W{i)v, V > 0} 

where W{i) € ,71‘) is given by the generators of ICi computed above. 

Then it holds that 

ICi - {ui G : «i = W{i)iv, h = W{i) 2 V, v > 0} 

= W[i)r{{v : W(i)2V = h, v> 0 }). 

hence, the vertices of 1C* are among the VF(i) i -images of the vertices 
v(0j (j = of e n"" : W{i) 2 V = U, V > 0}. The latter 

can be computed using PORTA [5]. For the desired upper bound all possibk 
basis submatrices of all W(i)2 have to be extracted which, together with <2 
and submatrices of the VF(*)i, yields representations for the vertices of . 
These are bounded above by the usual estimates using matrix norms (see [14] 
for details). 

It is evident that the procedures discussed in this section are not suitable 
for large-scale problems. However, they can serve well for problems with 
moderate size. 



5 Conclusions 

Fourier-Motzkin elimination provides an elegant way to look at preprocessing 
in stochastic programming. Moreover there is a well tested code, PORTA [5], 
that is based on Fourier-Motzkin elimination, such that, in addition to the 
codes reported in [18], [19], another convenient computer tool for prepro- 




221 



cessing in stochastic programming is available. It is well known that Fourier- 
Motzkin elimination is of exponential complexity such that application of the 
method has to be restricted to problems of moderate size. In the present paper 
we discussed two applications with natural size limitation. For stochastic pro- 
grams with integer recourse, preprocessing is helpful for restricting the search 
in enumeration algorithms. In the stability analysis of stochastic programs, 
Fourier-Motzkin elimination can be used to widen the class of problems for 
which sufficient stability conditions can be verified. 
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Bounds for the Reliability of 
k-out-of-connected- ( r ,s ) -from- ( m ,n ) : F 
Lattice Systems ^ 



T. Szantai 

Technical University of Budapest, Department of Mathematics 

Abstract. In the paper new bounds for the reliability of k-out-of-connected- 
(r,s)-from-(m,n):F lattice systems are given. These bounds are based on the 
Boole-Bonferroni bounding techniques and one of them on the Hunter- Worsley 
bound. All of these bounds need the calculation of the first few binomial moments 
of a random variable introduced according to the special reliability system. The 
main results of the paper are formulae for calculation of these binomial moments. 



Keywords. Reliability system, Boole-Bonferroni bounds. Binomial moments. 



1 Introduction 

The k-out-of-connected-(r,s)-from-(m,n):F lattice systems are straightforward 
generalizations of the linear consecutive k-out-of-r-from-n:F failing systems. 
The later system was introduced by S. Kounias and M. Sfakianakis ([5], [6]) 
and was investigated by M.V. Koutras and S.G. Papastavridis ([7]). Boole- 
Bonferroni type bounds were applied in this context first by M. Sfakianakis, 
S. Kounias and A. Hillaris ([11]) and some sharper bounds of this type were 
given later by A. Habib and T. Szantai ([3]). A conscientious survey paper of 
the topic was written by M.T. Chao, J.C. Fu and M.V. Koutras ([2]). 

The 2-dimensional generalizations of the above mentioned reliability systems 
were investigated first by A.A. Salvia and W.C. Lasher ([10]) and then by T.K. 
Boehme, a. Kossow and W. Preuss ([!]). Lower and upper bounds for the 2- 
dimensional generalizations was published first in the paper by J. Malinowski 
and W. Preuss ([8]). They investigated the so called connected- (r,s)-out-of- 
(m,n):F lattice system. In this paper we are dealing with a more general reliability 
system which can be called k-out-of-connected-(r,s)-from-(m,n):F lattice sys- 
tem. If in this system one takes k to be equal to rxs gets back the former system. 
So this system can be regarded as a generalization of the former one. 

^This work was partly supported by grants from the National Scientific Research Fund, 
Hungary, T014102 
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2 Definition of the investigated lattice system 

Let US have n equal or unequal components (or elements) which are either oper- 
ating or failing independently of each other in stochastic sense. 

k-out-of-n:F failing system 

The system of the n elements itself is failing if and only if there exists at least k 
failing elements in the system. 

Consecutive— k— out— of— n:F failing system 

The system of the n elements itself is failing if and only if there exists at least k 
consecutive failing elements in the system. 

Consecutive-k-out-of-r-from-n:F failing system 

The system of the n elements itself is failing if and only if there exists an r- 
element consecutive part of the elements with at least k failing elements in it. 

In the above systems the elements can be arranged in a line or in a circle. If the 
elements of the system are arranged in a line then the system is called linear and 
if the elements of the system are arranged in a circle then the system is called 

circular. 

In the two-dimensional case the system consists of mxn components ordered 
like the elements of an (m, n)-matrix. 

Connected-(r,s)-out-of-(m,n):F lattice system 

The system of the mxn elements itself is failing if and only if there exists at least 
one connected (r, s) submatrix of the system with all failing elements, 
k-out-of— connected-(r,s)-out-of-(m,n):F lattice system 

The system of the mxn elements itself is failing if and only if there exists at least 
one connected (r, s) submatrix of the system with at least k failing elements in 
it. 

According to the (i,j) indices of the left-upper corner element in the (r, s)- 
submatrices we have 

1 ^ ^ m — r-hl; 1 < j < n — s-hl. 

Let us denote by 



M — m — r-hl; N = n — s-f-1, 
then there exist MxN events causing system failure: 

Ajiji = {at least k elements in are failing}, 



1 < n < M; 1 < ji < N, 
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where 



Eiijx = {^ij ■i = h,-;ii + r-l;j = ji, ji + s - 1}. 
According to the correspondence 



Ai Ai^j^ , 

where 



h 




+ 1 , 



ji = l- {ii - l)N, 



or in opposite way 



one can order the MxN events causing system failure in a linear order and the 
probability of the system failure is 



MN 

F = P,{^A,}. 

1=1 

The reliability of the system obviously is i? = 1 — F. 

3 Bounding techniques of the system failure 

In this section we list some results according to the so called Boole-Bonferroni 
type and Hunter-Worsley inequalities. We follow the treatment of the book by A. 
Prekopa (see [9]) which is based on formulation of linear programming problems 
according to the best possible Boole-Bonferroni type inequalities. 

Let us regard a set of events Ai, ..., on an arbitrary probability space. If we 
want to give lower and upper bounds on the probability that at least one of these 
events occur i.e. on the probability of the sum of them 



P — Pr{Ai + • • • + An}^ 

then it is useful to introduce the random variable /i designating the number of 
those events Ai, ..., An which occur. 
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For the binomial moments E{(^)} of the random variable fi we have 



E ^ = E = 5,, k = l, n. (3.1) 

We remark that the Sk,k = 1, ...,n terms in the formula (3.1) are the same as 
they are in the well-known inclusion-exclusion formula: 



kzzl 

The simplest proof of the equality (3.1) that we reproduce here was given by 
Takacs (see [12]). Let /Lti, be the indicator random variables of the events 
i.e. fii = 1 if A{ occurs and //,• =: 0 otherwise, i = l,...,n. Thus, 
— A^i + •• + /in- By a well-known formula of Cauchy for binomial coefficients, 
we have 




Jl>0,...,in>0 



Since /ii,...,/i„ take only the values 0,1, it follows that if any of the numbers 
jl, •••Jn is different from 0 or 1, then the term in the above sum is 0. Thus, 



P’l ■ • • • 



If we take expectations on both sides and use the equality 



• • • /^u) = Pr{A-, 



(3.1) follows. 

On the other hand if the discrete probability distribution of the random variable 
H is denoted by pj,j = 0, 1, ...,n, i.e. 

Pj = Fi{n = j}, j = 0,l,...,n, 

then taking into account that (^) equals zero when j < Ar by definition of the 
expected value we get the linear equation system 



®{©} = E(i)w. ‘ = > "■ 



(3.2) 
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From the equations (3.1) and (3.2) we get the following linear equation system 

Sk — ^ ^ ^ ~ 1 5 ••• J ^ • 

Now if we know all of the binomial moments S\, ...,Sn then the unique solution 
of the above linear equation system gives the probabilities pi, ...,Pn and the sum 
of them equals to the probability value P what we were looking for. 

In the case when only the first few binomial moments S\, are known where 

m « n we can regard the probabilities pi, ...,p„ also as unknown variables and 
formulate the linear programming problems 



min 


{ Pi 


+ 


P2 


-h .. 


.. + 


Pm 


+ .. 


.. + 


Pn } 




Pi 
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(Dp 2 


-h .. 


.. + 




+ .. 


.. + 


3 

II 
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+ .. 


.. + 


(”)P» 


+ .. 
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On = S, 
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(m)P" — 




Pi > 0, 




P 2 > 0, 






Pm ^ Oj 






Pn > 0 


and 




















max 


{ Pi 
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P 2 


+ . 


.. + 


Pm 


+ . 


.. + 


Pn } 




Pi 


+ 


il)P2 


-f- . 


.. + 


(T)p™ 


+ . 


.. + 


H- 3 

II 








P2 


+ . 


.. + 


C?)p™ 


+ . 


.. + 


{'^)p„ = S 2 














Pm 


+ . 


.. + 


C)Pn = 




Pi > 0, 




P2 > 0, 






Pm ^ O 5 






3 

IV 

0 



The optimal solutions of the above linear programming problems provide us with 
the best possible lower respectively upper bounds on the probability value 

P = Pv{^>l} = PT{Ai + ...-\-An}. 

For small values of m there are known formulae for the optimal solutions of the 
linear programming problems. 
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Case m = 2 : 



<P<Si--S2, (3.3) 

2 * + 1 2* (2* 4- 1) n 

where 

_ 252 

* L^i 

Case m = 3 : 

r+2n-l ^ 2(2j*+n-2) ^ 6 

{j* + l)n ^ j*{j* + l)n ^ 

<P< (3.4) 



<5i 



2(2k* - 1) 
k*{k* + 1) 



52 + 



6 



k*{k* + 1) 



where 



and 



J 



—65s + 2(n — 2)^2 
—2^2 + (n — l)5i 



+ Ij 



k* 



353 

52 



+ 2 . 



A further upper bound was given independently by D. Hunter ([4]) and by 
K.J. WoRSLEY ([13]). This bound is based only on 5i and some individual 
probabability values involved in 52 but it proves to be sharper than the Boole- 
Bonferroni upper bound based on 5i , S 2 and sometimes even sharper than the 
Boole-Bonferroni bound based on 5i,52,53. This bound is given by 



P<Si- Y1 (3.5) 

(iJ)eT* 

where T* is a maximal spanning tree in the non-oriented complete graph with 
n nodes for which the probabilities Pr{Aj} respectively Pr{A,Aj} are assigned 
to the nodes respectively to the arcs as weights. 
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4 Binomial moment calculations 

Let US introduce the notation q = 1 — p then we have the following. 

Calculation of S\ 



MN 



5i = 



( 4 . 1 ) 



/ = ! 



where the index I with the following row resp. column index can be associated 

h = 

h = l-{h-\)N. 

The probability Fv{Ai^^} can be calculated as the sum of the appropriate bino- 
mial probabilities: 



Wi / X 






( 4 . 2 ) 



where l\ and ui are the lower resp. upper limits of the summation: 



ui = rs, 

h = k. 

Calculation of 5*2 

MiV-l MN 

S2= E ( 4 . 3 ) 

ll=l /2=L+1 

where the indices I 1 J 2 with the following row resp. column indices can be asso- 
ciated 



h 

h 



ihiAl 

N 



/i - (ii - l)iV, 

ilSJlii -L 1 

h - [i2 - l)N. 



The calculation of the probability Pr{A,jjj now depends on the number of 

common elements in the sets Ejj^, 
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ni2 = max{0,r- (22 ~ 2i)}max{0, s - |j2 “ ji|}- 
If ni2 = 0 then the events Ai^j^ are obviously independent and so 



Pr{^,iji74,'2j2} — }Pr{A,'2j2}- 

If ni2 > 0 then the theorem of total probability can be applied by making con- 
dition on the exact number of common elements in the sets After 

some algebra we get 



«1 U2 «3 / \ / \ / \ 

E E (4.5) 

X\—l\ X2=l2 



where 



= ^12j 

U2 = rs — ni2, 

W3 = U2, 

11 = max{ 0 , fc — 7112}, 

12 = max{0, Ar — xi}, 

h = h, 

X - Xi-\-X2 AX^, 

u = Ul + U2 + U3. 

Calculation of 5a 



MiV-2 MN-l MN 

Ss= E E (4.6) 

/l = l /2=^l +1 ^3=^2 + l 

where the indices I1J2J3 with the following row resp. column indices can be 
associated 



h 



ihzAl 

N 



+ 1 , 



jl 

h 

h 

h 



h - (ii - l)N, 



N 



+ Ij 



I2 - (72 - l)N, 

I + 1 



h - /3-(i3 -l)A. 



The calculation of the probability ^^{Ai^j^Ai^j^Ai^j^} now depends on the num- 
ber of common elements in the sets \ Ei^j^ and Ei^j^, Ej^j^: 
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ni2 = max{0,r - (2*2 - ii)}max{0,s - |j2 - ji|}, 

7213 = max{0,r- (23 - 2i)}max{0,s- \j3~ji\}, 

7223 == max{ 0 ,r- (23 - 22)}max{0,s- lj'3 -^2!}. 

If at least two of 7212 , 7213,7223 equals to zero the calculation of the probability 
Fv{Aij^Ai^j^Ai^j^} is relatively easy. Because of the independency it can be 
carried out by applying the formulae given earlier. 

If 7212 = 0, 7213 = 0, 7223 = 0 then 



^* 2 i 2 ^* 3 i 3 } — }Pr{^i 2 j 2 }P^{^* 3 i 3 }- 

If 7212 > 0, 7213 = 0, 7223 = 0 then 

Pr{Aiiji } — PrjAijjj 

^*2j2 }Pr{^3jJ- (4-8) 

If 2212 = 0, 7213 > 0, 7223 = 0 then 

Prl^iiji 

^*2^2 ^*3^3 } — }Pr{A-,,J. (4.9) 

If 7212 = 0, 7213 = 0, 7223 > 0 then 



Pr{yl, iji A,' 2 j 2 ^’ 3 ^ 3 } — }P^{^i2j2^»3j3}- ("^- 10 ) 



If more than one of 7212 , 7213, 7223 are positive then the theorem of total probability 
can be applied recursively by making condition on the exact number of common 
elements in the sets Ei^j^\ Ei^j^ and Ei^j^, Ei^j^ after each other. 

Again after some algebra we get the following formulae. 

If 



ni 2 > 0,7213 > 0,7223 = 0 , 



or 

ni2 > 0,7213 = 0,7223 > 0 , 



or 



2212 = 0,7213 > 0,7223 > 0 , 



then 



U5 



Pr { A,' 1 j 1 Ai 2 3j3 



)=E - E 

Xi=li X5 = U 



q^pU X ^ 



(4.11) 



where 
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and 



i/1 




rs — nci — 72C2, 






72Ci, 


^^3 


= 


72C2, 


i/4 




rs — nci, 


t/5 


= 


rs — 72C2, 


/l 


= 


max{ 0 , k — nci — ncs}, 


/2 


= 


max{ 0 , k — rs}, 


^3 




/2, 


/4 




max{ 0 , k — X2}, 


/5 


= 


max{ 0 , k — 2 ^ 3 }, 


X 




+ ^2 + ^3 + ^4 + ^5 


U 




+ ^2 + ^3 "1" ^4 4- W5 



nci 


= ^13, 


nc2 


= y^23, 


if 


^12 = 


nci 


= ^12, 


nc2 


= ^23, 


if 


^13 = 


nci 


= ^12, 


nc2 


= ^13, 


if 


^23 = 



0 , 

0, 

0 . 



Finally, if > 0, nis > 0, n 23 > 0, then 






}=E-L 

Xi=li X7=lj 



q-p--, 



where 



Ui = (f-(i3-ji))(s-max{(;2-ji),(j3-ii),(i3-j2)}), 

^2 = ^12 — ^1, 

^3 = ^13 — Wi, 

^4 = ?^23 — ^1 j 

i/5 = rs - i /1 - i /2 - i/3, 

iig = T5 — i/1 — i/2 — i/4, 

i/7 = _ 1/^ _ 1^3 _ 

/i = max{0, Ar — (rs — i/i)}, 

I 2 = max{0, Ar — xi — (r5 — 7212 )}, 

Is = max{ 0 , Ar — a?! — (rs — 7/13)}, 

I4 = max{0, fc — xi — (rs - 7123)}, 

/s = maxjO, Ar — xi - X2 - 2^3}, 

Iq = max{ 0 , Ar — a?i — X2 — X4}, 

I7 = max{ 0 , Ar — a?i — a?3 — 2^4}, 

X = a?i + 2^2 + a?3 + 0:4 + 2^5 + a^6 + 

U = i/1 -f W2 + 1/3 + t^4 + 4- «6 + U7. 



(4.12) 
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Economic calculation of S\ and S2 

It is obvious that the single event probability is independent of the 

indices ii, ji so we can introduce the notation 

Pi = Pr{Aijjj}, 



and 



Si = MNPi. ( 4 . 13 ) 

It is also obvious that the pair event probabilities Fr{Aij^Ai^j^} are depending 
on the indices ii,ji]i 2 ,j 2 only across the differencies 

^ di 2 — ^*2 ^*1 5 

jdl2 = 32- jh 

so we can introduce the notation 



/i(ic?i2 5 ^^12) — lii } * 

Then after some combinatorical considerations one can get the formula 



2 ■'■')(« + 2A/i»,2-j«i2(iu,2 + !)) + «“(“ 2 '"''')}^'* 

(4.14) 

Jdl2 = l ^ 

+ E ("■•'''‘')(ivm«i2,o) + 2 E (''7M).(ai2jd.2)}, 

fdl2 = l ^ ^ ^ ^ ^ ^ 

where 



iui 2 = min{r — 1, — 1}, 

jui 2 =■ min{s — 1, M — 1}. 




234 



5 Examples 

In the following tables there are given the results according to all of the test 
problems investigated by J. Malinowski and W. Preuss ([8]). In the heading 
of the tables the following abbreviations are used: 



P 

lo (M-P) 

up (M-P) 

exact 

1o(52) 

up(52) 

1o(53) 

up(53) 

lo(Hu) 



reliability of one element in the lattice system, 
lower bound by Malinowski and Preuss, 
upper bound by Malinowski and Preuss, 
exact value for the reliability of the system, 
Boole-Bonferroni lower bound based on S\ and S 2 , 
Boole-Bonferroni upper bound based on S\ and 52, 
Boole-Bonferroni lower bound based on S \ , S 2 and S 3 , 
Boole-Bonferroni upper bound based on 5i, S 2 and S 3 , 
Hunter- Worsley lower bound. 



We remark that the bounds given in the paper are according to the system failing 
probability. When we turn to the bounds according to the reliability of the system 
from lower bounds we get upper and vice versa. So the Hunter- Worsley upper 
bound becomes lower bound according to the reliability of the system. This is 
the reason why in the tables the Hunter- Worsley bounds always appear as lower 
bound. In the tables the uniquely best lower resp. upper bounds are denoted by 
bold faced characters. 

The test problems are fully defined by the values of m, n, r, 5 and p, as we always 
take k = rs to get the special cases investigated by J. Malinowski and W. 
Preuss ([8]). The lo(M-P), up(M-P) and exact values were taken from the above 
paper by J. Malinowski and W. Preuss. The lo(52), up(52) and 10 ( 53 ), up(5a) 
values were calculated by the formulae (3.3) and (3.4), while for the calculation 
of the lo(Hu) values the formula (3.5) was used. For applying formulae (3.3)- 
(3.5) the binomial moments 5i,52 and S 3 are necessary only. The calculation 
procedures of these moments is fully defined in the Chapter 4. For completeness 
Si can be calculated by (4.1), using (4.2); S 2 can be calculated by (4.3), using 
(4.4)-(4.5) and S 3 can be calculated by (4.6), using (4.7)-(4.12). It is important 
to remark here that while the economic calculation form (4.13) for S\ can be 
applied always, the economic calculation form (4.14) for 52 can be applied only 
in the case when we don’t want to calculate the lo(Hu) values as these latter 
need not only the knowledge of the value S 2 but also the individual probabilities 

Pr{A,iji74,*2j2}. 
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Table 1 





m 


= 4,n = 


II 

to 


,5 = 2 




p 


lo(M-P) 


HSs) 


exact 


up(53) 


up(M-P) 


0.5 


0.601 


0.621 


0.644 


0.699 


0.712 


0.7 


0.933 


0.937 


0.937 


0.938 


0.955 


0.9 


0.999 


0.999 


0.999 


0.999 


0.999 






Table 2 








m 


= 4,n = 


3,r = 2,s = 2 




P 


lo(M-P) 


10(53) 


exact 


up(53) 


up(M-P) 


0.5 


0.712 


0.729 


0.740 


0.755 


0.793 


0.7 


0.955 


0.957 


0.957 


0.957 


0.969 


0.9 


0.999 


0.999 


0.999 


0.999 


0.999 






Table 3 








m 


= 5, n = 


5, r = 2, s = 2 




P 


lo(M-P) 


lo(Hu) 


1o(53) 


up(53) 


up(M-P) 


0.5 


0.409 


0.234 


0.396 


0.583 


0.640 


0.7 


0.885 


0.881 


0.891 


0.896 


0.941 


0.9 


0.998 


0.998 


0.998 


0.998 


0.999 






Table 4 








m - 


= 10, n = 


10, r = 


2,5 = 2 




P 


lo(M-P) 


lo(Hu) 


1o(53) 


up(53) 


up(M-P) 


0.5 


0.012 


0.000 


0.000 


0.245 


0.086 


0.7 


0.541 


0.402 


0.544 


0.666 


0.711 


0.9 


0.992 


0.992 


0.992 


0.992 


0.996 






Table 5 








m = 


= 10, n = 


10, r — 


3,5 = 3 




P 


lo(M-P) 


lo(Hu) 


1o(53) 


up(53) 


up(M-P) 


Kg 




0.890 


0.903 


0.919 


0.959 


|H 


0.999 


0.999 


0.999 


0.999 


0.999 


0.9 




0.999 


0.999 


0.999 


0.999 
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Table 6 





m 


= 20, n = 


20, r = 


3,5 = 3 




p 


lo(M-P) 


lo(Hu) 


1o(53) 


UP(53) 


up(M-P) 


0.6 


0.923 


0.920 


0.928 


0.931 


0.974 


0.7 


0.994 


0.994 


0.994 


0.994 


0.998 


0.8 


0.999 


0.999 


0.999 


0.999 


0.999 


0.9 


0.999 


0.999 


0.999 


0.999 


0.999 






Table 7 








m 


= 50, n = 


50, r = 


3 , 5 = 3 




P 


lo(M-P) 


lo(Hu) 


1o(53) 




up(M-P) 


0.6 


0.567 


0.435 


0.570 


0.673 


0.828 


0.7 


0.957 


0.956 


0.958 


0.958 


0.985 


0.8 


0.999 


0.999 


0.999 


0.999 


0.999 


0.9 


0.999 


0.999 


0.999 


0.999 


0.999 






Table 8 








m = 


: 100, n — 


100, r = 


= 3,5 = 3 




P 


lo(M-P) 


lo(Hu) 


\0iS2) 


up(52) 


up(M-P) 


0.6 


0.094 


0.000 


0.000 


0.336 


0.451 


0.7 


0.832 


0.816 


0.811 


0.840 


0.940 


0.8 


0.995 


0.995 


0.995 


0.995 


0.998 


0.9 


0.999 


0.999 


0.999 


0.999 


0.999 



6 Conclusions 

The Boole-Bonferroni upper bounds are sharper than the upper bounds proposed 
by Malinowski and Preuss, except one case with extremely low system reliabil- 
ity. The upper bounds given in the paper by Malinowski and Preuss ([8]) are 
sometimes sharper than the Boole-Bonferroni upper bounds. This is the case es- 
pecially when the system is large and the calculation of the more accurate Boole- 
Bonferroni bounds using the first three binomial moments becomes impossible. 
Finally we remark that while J. Malinowski and W. Preuss are dealing with 
the special connected-(r,s)-out-of-(m,n):F lattice systems only, the bounds given 
in this paper are according to the more general k-out-of-connected-(r,s)-from- 
(m,n):F lattice systems. 



Acknowledgement. The author would like to express his thanks to the 
referee for the invaluable comments and advices. 




237 



References 

[1] Boehme, T.K., Kossow, A. and Preuss, W. ” A generalization of consecutive- 
k-out-of-n:F systems”, IEEE Trans. Reliability, Vol. 41 , 1992, 451-457. 

[2] Chao, M.T., Fu, J.C. and Koutras, M.V. ’’Survey of reliability studies of 
consecutive-k-out-of-n:F & related systems”, IEEE Trans. Reliability, Vol. 44 , 
1995, 120-127. 

[3] Habib, A. and Szantai, T. ’’New bounds on the reliability of the consecutive k- 
out-of-r-from-n:F system”, submitted to Reliability Engineering and System 
Safety. 

[4] Hunter, D. ”An upper bound for the probability of a union”, J. Appl. Prob., 
Vol. 13 , 1976, 597-603. 

[5] Kounias, S. and Sfakianakis, M. ’’The reliability of a linear system and its 
connection with the generalized birthday problem”, Statistica Applicata, Vol. 
3 , 1991, 531-543. 

[6] Kounias, S. and Sfakianakis, M. ”A combinatorial problem associated with 
the reliability of a circular system”, Proc. HERMIS’92 (Lipitakis, ed), 1992, 
187-196. 

[7] Koutras, M.V. and Papastavridis, S.G. ’’Application of the Stein-Chen 
method for bounds and limit theorems in the reliability of coherent structures” , 
Nav. Research Logistics, Vol. 40, 1993, 617-631. 

[8] Malinowski, J. and Preuss, W. ’’Lower and upper bounds for the relia-bility 
of connected-(r,s)-out-of-(m,n):F lattice systems”, IEEE Trans. Reliability, 
Vol. 45, 1996, 156-160. 

[9] Prekopa, A. Stochastic Programming. Kluwer Academic Publishers, Dord- 
recht, 1995. 

[10] Salvia, A. A. and Lasher, W.C. ” 2-Dimensional consecutive-k-out-of-n:F 
models”, IEEE Trans. Reliability, Vol. 39 , 1990, 382-385. 

[11] Sfakianakis, M., Kounias, S. and Hillaris, A. ’’Reliability of a consecutive- 
k-out-of-r-from-n:F system”, IEEE Trans. Reliability, Vol. 41 , 1992, 442-447. 

[12] Takacs, L. ”On the general probability theorem”. Communications of the 
Dept, of Math, and Physics of the Hungarian Acad. Sci., Vol. 5, 1955, 467-476. 

[13] Worsley, K.J. ”An improved Bonferroni inequality and applications”, Bio- 
metrika , Vol. 69 , 1982, 297-302. 




On a Relation between Problems of Calculus of Variations 
and Mathematical Programming 

Tamas RAPCSAk' and Anna VAsARHELYF 

^ Laboratory of Operations Research and Decision Systems, Computer and 
Automation Institute, Hung. Acad, of Sciences, P.O.Box 63, H-1518 Budapest, 
Hungary 

^ Technical Unxv. of Budapest, Muegyetem rkp. 3. H-1 111 Budapest, Hungary 



Abstract. Important practical problems of calculus of variations can be 
transformed into one-parametric optimization ones. Simple engineering examples 
are presented to show that the two kinds of solution coincide. 



Keywords: Parametric optimization, calculus of variations 



1. Introduction 

Several problems in mechanics can be described by the help of systems of 
differential equations with initial and boundary conditions. In the case of boundary 
value problems, there is no difference between the solution of a system of 
differential equations and of a corresponding problem of calculus of variations. 
Problems of calculus of variations seem not to be well fitted to problems of initial 
value, or can be rendered by difficult approximations, only. The majority of 
numerical methods determine an approximating solution, but the question is how 
precise this approximation is. If both initial and boundary conditions belong to a 
system of differential equations related to problems in mechanics, the simultaneous 
handling of different conditions is difficult. 

For solving numerical problems in the case of loading changing in time, 
generally two approaches are discussed in the literature related to problems of 
boundary values. 

1. The problem is solved through fixing the values of the load function in several 
given times. In this way, an arbitrary number of function values defining the change 
of state variables depending on time can be determined. In this case, no influence 
of values calculated in different periods can be directly taken into account. 

2. The problem is solved by supposing separability of the function in the 
independent variables related to place and time, which seems to be a strong 
restriction. 

The elaboration of a solving method different from the previous ones is justified 
by the above. Paragraph 2 deals with the relation between problems of calculus of 
variations and mathematical programming. In Paragraph 3, two examples show 
how to transform variational problems obtained by Hu-Washizu 
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principle into principle into mathematical programming ones in such a way that the 
solution provided by the two different methods be equal. 

2. Relation between problems of calculus of variations and mathematical 
programming 

Let A be a set. Consider the parametric problems of calculus of variations 

h 

minJ(x)= 

h 

x(0 = (xi (t), (0) eC [/, , ?], (2.1) 

/(?, x) eC ([/, , /2 ]x^) , /: [ ^ 1 , ^2 ] ^ ^ » 

where the aim is to determine the unknown vector function x(f). Here, 

C([ti 5 f 2 ] ^ [^1 9 ^2 ] fhc class of continuous functions on the set 

cf piecewise continuous functions on interval 
respectively. Consider the optimization problem 

min 

where the minimum must be determined for every fixed valuef e[fj ,?2 ]• 

Definition 2.1. Let A qR” be an arbitrary subset. It is said that the set 
Rq = [^1,^2 ]X/ 4 c is feasible if for every element (fg ,Xq) e[/j ,^2 ]xA, 
3x(f), f e[f() — 6, fg +8], a continuous vector fimction for which 
x(fo) =Xq and (t,x(t))e[tQ - 5,fo + S]x^ hold. 

A vector fimction x(f), fe[fj,f 2 ], is feasible if 

(f,x(f)) ei?o, x(f) eC \t\,t2^- Hestenes book, the equivalence of 

problems (2.1) and (2.2) is stated as follows: 

Theorem 2.1. Functional 7(x) is minimized by a feasible vector function Xfl(f) , 
t e[fj,/ 2 ]> over the feasible piecewise continuous functions defined on the 
feasible region Rq = [/j ,^2 ] if and only if the inequality 

f{t,x)>f{t,XQ{t)), te[ti,t2], 



( 2 . 3 ) 
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fulfils for every element (t,x) €Rq . 

If the vector function Xq (f), t e[f j , ^2 ] , is the minimum of functional J{x) , 
then (f)) is continuous on t s[fi,f 2 ]- 



form of 



Let us examine the case if instead of (2.1), the problem is given in the 



min y,(x,u,|4)= (2-4) 



x(t),v(0, |4(0 eC ,^2 ] , I(f,x,u, |Li) eC([fi ,^2 ]x^x5xZ)) 

where AqR" ,B ,DqR'" . 



Let us consider 






m 



L(t,x,\),\i) = f(t,x) + '^\ijhj(t,x)+'^Vigi(t,x), 

j=i /=i 



f,hj,gieC([ti,t 2 ]^X j = h-,q, i = 

— m — q 

[fi,l2] ,w(0eC [fi,l2]- 



(2.5) 



Mathematical programming problems of type (2.2) corresponding to problems 
of calculus of variations (2.4) can be stated in the form of 

minZ(f,x,u,|x) 

(f , x,u , fj.) € [f j , ^2 ] x^x5xD, 

( 2 . 6 ) 

Z(f,x,u,|a)eC([fi,f 2 ] xAxBxD), 

L\ [^ 1 ,^ 2 ] xAxBxD-^R. 



In this case, ^ 



Corollary 2.1. Functional (XjU, |Ll) is minimized by a feasible vector function 
(x 0 (/) 5 Oq(/) 5 |LIq(/)) , ^^[^ 15 ^ 2 ]’ feasible piecewise 

continuous functions defined on the feasible region Rq = 
and only if the inequality 

Z(r,x,u,^)>Z(f,Xo(0,v)o(0,)io(OX ^€[^ 1 ,^ 2 ]* (2-7) 

fulfils for every element (/ 5 X 5 U, |a) . 
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Ifthe vector function (xq(0,Oo(0,j^o(0)J ^[^ 1 ,^ 2 ], is the minimum of 
functional Ji(x,o,ju), then the function L(f,Xo(/),t>o (0>/^o (OX is 

continuous on the interval ^ s [f 1 , 0 1* 

Minimization problems (2.6) may correspond to a pair of constrained 
optimization problems: 

min/(/,x) 



hj(t,x)=0, j = \,...,q, 
g,.(/,x)>0, 

f,hj,gt eC([/i,r 2 l)x^, j = \ -,q, i = - ,m, 

f,hj,gr.C{[t^,t 2 ])xA--yR, j = \,...,q, i = \. 




( 2 . 8 ) 



m. 



q m 

VJ(t,x)+ V^/t//,x) + X 0 ^,g, (t,x) = 0 

y=i <=i 

(f,x)=0, o, >0, i = l,...,/w, it,x,u,fi) e[(j,t2]x^x5xZ), (2.9) 
f,hj,g^ eC([(i,t2]x^X j = i = 



The relations between the optimality conditions of problems (2.6), (2.7) and 
(2.8) are given by the Lagrange theorem and duality theorem (see, e.g., Hestenes 
book). Thus, problems of calculus of variations can be transformed into 
unconstrained or constrained optimization problems. 



3. Applications in elastic analysis of structures 

Elastic problems can be formulated by the Hu-Washizu functional in the form of 

min n. = J||[/l(e(x))-P(x)*u(x)]dF -|n[(e(x)-Bu(x))*(T(x)]dV 

V V 

- J|P(x)%(x)dS- n«(x))%(x)^f5. 



(3.1) 
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where Ilj is the potential energy fimction, V the volume of the structure, x the 

vector of position, e(x) the vector of the strain ftmctions, J(e(x)) the elastic energy 
function, P(x) the vector of the external load functions, u(x) the vector hmctions 
of displacement, B the transfer matrix which contains the corresponding 
derivatives of u(x), a(x) the vector function of stress, Sj force boundary condition. 



Sjj displacements boundary, and the symbols and * a prescribed value and the 
transpose, respectively. 

In some cases, problems (3.1) lead to problems of calculus of variations as it 
will be shown in the following two examples. 

First, the deflection function of simple supported beam will be approximated 
with an orthogonal polynom system. Let us consider a simple supported beam 
with 271 length and unit rigidity. The external load is q(x)=2sin (x)+3sin(3x) and 
the problem is to determine the deflection function u(x). Under boundary 
conditions u(0)=0, u(27t)=0 and (u"(0)=0, u"(27t)=0), the problem of calculus of 
variations is 



2n 

min J{u) = J 
0 



d^u(x) 
. dx^ 



^2 



+ q(x)u(x) 



dx. 



(3.2) 



In problem (3.2), choose a basis Ni , i=l,...,6, in the class of the feasible 
functions in the form of 

N, (xr) = sin(jf), (xr) = sin(2x:), (x) = sin(3xr), 

^ (3.3) 

■^4 (^) = cos(x), (jr) = cos(2x), (x) = cos(3x). 

In order to satisfy the boundary conditions, the functions N4, N5 and Ne will 
leave the basis, then the unknown vector function u (x) and the given function 
q(x) will be expressed by the basis as follows: 

3 3 

»(x)= Z a.N.{x\ q(x)= 2 b.N.{x), N.{x)&I?, (3.4) 

7=1 7=1 

where the multipliers a , , E2 , Ej , are unknown. 

The boundary conditions are satisfied by the basis: 

3 3 

7/(0) = 2 ■ sin . (7x) = 0, uiln) = S a. sin . (JItt) = 0, 

7=1 7=1 ^ ^ 

2 , 3 

77"(0)= 2 -7“a- sin.(7’0) = 0, 7/"(2 ;t) = X -7^a- sin.(72;r) = 0, 

7 = 1 7 = 1 ' ' 

Thus, we have that 
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In 
min I 
* 0 



( 3 

«f2( a (x)) 
/ = 1 






3 3 

+ I h.N.{x) I a.A^.(x) 
/ = 1 / = 1 



dx, »bR . 



3 (3.5) 



By differentiation in (3.5), we obtain that 



mm ] 



0 



V2 



\k 



3 3 

Z a,(-k^)sin(kx) Z a A~r)sin(Jx) 

=1* y=l ^ 



+ 



ta!)c. 



3 3 

+ S *, sin(yx) Z sin(Ax) 
y = 1 A: = 1 

By using orthogonality, problems (3.7) and (3.8) are obtained: 

3 ^ 

— Z k‘* a^sin^(kx)+ Z ajbj^i\r?'{kx) 



min I 



0 



V2)i: = i 



^ = 1 



(3.6) 



dx, a 

(3.7) 



min 

a 



'I 

l 3 

^2yt = l 



Z j SAn^{kx)dx+ Z / sin^(Ax)tfe 
0 . . - 



3 

z 

k = \ 



Ik 

/ 

0 






sl&R\ 



(3.8) 



By the Lagrange multiplier rule, the stationary function can be determined as 
follows: 



yfc'‘a^+Z>^ = 0, A: = 1,2,3, 



from which 



^ = 1,2,3. 



In the case of q(x) = 2sin(x)+3sin(3.\)i, bi = 2, b 2 = 0, b3=3, the approximation of 
the unknown function is 

u(x) = - 2sin(.\) - 1/27 sin3(x). (3.9) 



Solution (3.9) is obtained by calculus of variations. 

Now, let us consider problem (3.7) and the corresponding mathematical 
programming one based on the correspondence t ^ x , 

it) => sin(Ax) , k=l,2,3, then 
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{a^N^{x\a2N2(,x),a^N-i{x))&C'^ [0,27 t] , 



3 

nx(t))^f( E a^Nf^{x))eC([OM]xR), f.R^R. 



Let J j^{x) = a^S\w{kx) , k=l,2,3, then the parametric mathematical 

programming problem can be formulated as the following unconstrained 
optimization problem for every fixed value x e[0,2;r] : 



min 



.4 .,2 



- Z k yj^+ E bf^s\n{kx)yj^ 



v2yt = i 






(3.10) 



k = 1 



By the Lagrange multiplier rule, 

k"^yj^-^bj^s\n{kx) = 0, k = 1,2,3, 

from which 



bi sin(^) 

> k=l,2,3. (3.11) 

K 

So, ai= -2, 32=0, 33= - 1/27, and this solution is equivalent to (3.9). 

b.) Secondly, the deflection fimction of simple supported beam is 
approximated with a nonorthogonal polynom system. Let us consider a simple 
supported beam with unique rigidity and unique length. The external load is 

q(x)=x-x'^ . The boundary conditions request at the supports that the deflection 
u(x) and moment function M(x) values are zero: 

A/(0) = 0, M(l) = 0, 

Ox lx 

w(0)=|jM(^d^-0, u(l)= //M(^d^ = 0. 

00 00 

The nonorthogonal function system is the following: 

/?oW = J^-(^-l), (3.12) 

The moment function M(x) is approximated by 

2 2 

M(x)« E Z a.p Ax), Si &R^ 
j = 0i = 0 ^ 



(3.13) 
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1 

where a j = | M{x)p .{x)dx 



0 



J 



The second derivatives are 

dx dx dx 

The third boundary condition is not satisfied by the polynom system pj(x), 

i=0,l,2. 

The problem of calculus of variatons based on (3. 1) is in the form of 



\ ( t ~ -j2 \4r^\ 



min J 
0 



-(M(x)f + »(*)[^-^ - «(»)] + ^ J J M(^)d#& 

2 dx^ QQ ) 



dx, 

(3.14) 



where F is a constant, mechanically, F is the reaction force. 

The approximation of the second derivative of the moment function is 

d^M{x) 2 2 d^pj{x) 

L L a. 

dx^ j = 0/ = 0 



dx 



The second derivatives are given in the basis as follows: 

£ I 

dx^ y = 0/ = o 

\d^pAx) 

where c. . = f j p (x)dx, Uj ~ 0,1,2. 

U dx 

The third boundary condition can be expressed by the polynom system in the 
form of 



\x 2 2 2 2 2 2 

JJ Z Z a.pmd^= Z Z a.\\pmd^= z z aJ., 

00j = 0i = 0 ^ j = 0i = 0 00 j = 0i^0 



2 2 2 2 
F= Z Z k.pp), F Z Z 
y=o/=o y = o/=o 

Substituting these approximations 
variations is as follows: 



2 2 2 2 

a.d.= Z Z Z 1 a.k,d,p.(x). 

’ J J^0i = 0k = 0l = 0 ' " ‘ ^ 
for (3.14), the problem of calculus of 
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0 ^‘^Vy=0i=0 / y=0i=0 y=Oi=0 y=0i=0 j 

(3.15 

y=0i=0Jt=0/=0 

2 2 

where m(x:) = ^^6,/?y(x), the values bj, i=0,l,2, are variables, and the 

y = 0;=0 

^ 2 

given load is expressed in the basis, i.e., q(x) = x - = X P • (^) > thus 

j = 0 ^ 

r,=l,i = 0,1,2. 

Now, the corresponding mathematical programming problem is formulated 
by introducing the notations =a,/?y(x), Zy=bfPj{x), 

i, j = 0, 1 ,2 , is given in the form of 

^^\j=0i=0 J ;=0i=0 j=0i=0 ;=0 j=0i=0k=0t=0 ^ 

(3.16) 

for every fixed x e[0,l] , where the unknowns are z,y , yy, and kj, i„j =0,1,2. By 
the Lagrange multiplier rule, 

j = 0i = 0 J j = 0i = 0 •' i = 0j = 0 

i 

J J j = Q J 
? 9 2 

« . 4 . 1 



i s s 

/ = 0j = 0i = 0 •’ 

In order to determine the moment function M(x), we use the second equality of 
(3.17) for every X e[0,l], the approximation )/■■=■ a^p Ax), ;,y=0,l,2, and 

V ■' 

the linear independency of the basis functions from which it follows that in our 
case 

c c c a. fl 

00 10 20 0 

c c c a — \ 13 181 

c c c a 1 

L 02 12 22JL 2j L -I 
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The system is solved by MATHCAD: 



pO(x): = x(x- 1) 

dpO(x) r^pO(x) 
dx 



ddpO(x) dpO(x) 
dx 

•1 

C() 0 ■ ddpC(x) pO(j^ dx 

.0 
•1 

Cjp = ddpl(x) pO(x)dx Cj j 
’ .0 

•1 

C 2 (, = ddp2(x) p0(x)dx Cj j 
’ Jo 



pl(x) :=x^ (x- 1) 

dpl(x) =^pl(x) 
dx 

ddpl(x) dpl(x) 
dx 



Iqj- ddpQ[jOpl(x)dx 

’ .0 



ddpl(x)pl(x) dx c, 

0 



ddp2(x>pl(x)dx 

0 



p2(x) : = x^ (x- 1) 

dp2(x) : = ^p2(x) 
dx 

ddp2(x) :=— dp2(x) 
dx 



•1 

Cq 2 - ddpQ[x) p2(x) d 

Jo 



ddpl(x)p2(x) d 

0 



!2 2 = ddp2(x)p2(x)d 

’ Jo 



o 

o 

o 


‘^ 1,0 


*^ 2,0 


/I 


\ 






/-5 


O 

II 

o 

o 




*^ 1,2 


b := 1 


a =C-*b 




a = 1 


[25 


[° 2,0 


° 2,1 


° 2,2 . 


\l 


/ 






\-35 




'- 0.333 


- 0.167 


- 0.1 \ 


/-15 


45 


-35 \ 




c = 


- 0.167 


- 0.133 


- 0.1 


II 

b 


-195 


175 




i 


rO.i 


- 0.1 


- 0 . 086 / 


\-35 


175 


- 175 / 




= 0 , 0 . 1 .. 1 

















q(x) = pO(x) + pl(x) + p2(x) 

M(^ :=aipQ()^+aipX)^+aip3(x 

M2(x) := a2-p0(x) + a 2 -pl(x) + a^p2(x) M(x) := M0(x) + MXx) + M2(x) 
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The result is equal with the solution by the finite element method. 
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Abstract The sensitivity to changes in parameters of 12 common deterministic and 
stochastic search methods has been investigated by measuring the time they need to 
approach the known minimum within a preset range of function values. These search 
methods use or do not use derivatives and depend on 1-5 parameters; their choice 
largely determines the experimental search time. An example demonstrates that an 
improper choice of parameters reverses the order of efficiency of search methods: i.e. 
the fastest method becomes the slowest one. To overcome the difficulty of a proper 
choice of parameters, the optimisation time was measured as a fimction of these 
parameters. Then the optimal parameter vector and optimal search time were deter- 
mined with regard to the parameter space for any search method and any fimction 
separately. The functions of optimisation time versus parameter are presented. Their 
trend, their structure and position of the optimum are discussed. 

The optimal parameter, run times, number of fimction values, number of decreasing 
fimction values, number of iterations required to pass half the distance fi“om the value 
of the test fimction at start to it’s minimum, etc. at the position of the optimum of 
optimisation times are discussed being parts of a characterising criterion vector with 
some 40 components which is typical for each search method and test fimction. 

The practical convergence speed of each search method has been determined by 
analysing the distribution of fimction values during iteration of the minimum and their 
iteration times. The sensitivity of the optimal parameters to changes in the feasibility 
parameter, the start vector, the distribution of random number vectors in R.S. methods, 
has been investigated experimentally. In separate programs the fimction error was 
investigated for each search method; appropriate termination criteria were selected. By 
increasing some sample size parameter in many parameter R.S. methods the feasible 
region could be reached again. 

1. Introduction 

1.1. Preliminary Notes 

There are numerous publications on deterministic and stochastic search methods that 
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search the optimium of a function, see e.g. [4, 9] and surveys in [10, 14, 15, 17, 18]. 
They either investigate the advantages or disadvantages of different methods 
analytically, or experimentally. In order to do so a set of parameters inherent to the 
search method under investigation is “suitably” chosen. An attempt is made to find one 
set of parameters “good for a class of functions”. 

In this article no further survey on search methods will be added to the existing ones. 
Rather the optimal choice of parameters in each search method - taken from a sample 
of stochastic and deterministic methods - is examined experimentally. How does ^s 
choice affect the “efficiency” of a search method? Does the “efficiency” of deterministic 
and stochastic methods differ? When does an “efficient method” turn into an “inefficient 
method”? How many parameters may a search method have that it can practically be 
handled? Does the same set of parameters hold good for all ftinctions? How fast are 
these methods in reality? What can the engineer do to find an unknown optimum? 
These questions result naturally from practical considerations. Nevertheless the 
outlined parameter problem seems to have been avoided in literature so far due to the 
large central processor times involved to obtain well established, experimental results 
and due to the elaborate structure of the testing program that is required. 

1.2. Definition of the Problem 



The niinimisation problem 

F*=minF(x) s.txe3t" (1) 

shall be solved with different parameter dependent search methods 
Mj=Mj(c) where c’=(co,Ci,C 2 ,...cJePi, i=l,2,...12 (2) 

which are either random or deterministic search methods. Here, P, denotes the space of 
admissible parameter vectors for method Mj(c). 

Starting from (x^,F(x^) they generate a set of iterations 

(3) 

that approximate the solution according to some convergence criterion, e.g. 

|F(x)-F*|<£f, |x-x*|<E^ |F‘‘^'-F''|<b etc. (4) 



The solution (x,F) is obtained after NX iterations, NR generated random No’s, NF 
function values,... NDF calculated gradients,..., NDEMI fimction values required to 
pass Fi/ 2 = F^-0.5*(F®-F*)), NFDOWN decreasing fimction values, NFEPS calls to the 
convergence criterion, ..., NFT fimction values per time, the central processor time RT, 
required to solve (1), etc. These counting variables may be used as time dependent and 
time independent components of a criterion vector CR characteristic of method Mj(c): 
CR = (NX, NR, NF, NDF,...NDEMI, NFDOWN, NFEPS,... NFT,... RT,..) (5) 

One is tempted to use vector CR as a numerical scale for the performance of a search 
method. However, practical reasons prevent from doing so, apart from the fact that 
vector CR has some 40 components to consider: Some components are not present in 
all search methods. Each component of CR, in general, is a different unknown non- 
linear fimction of vector c in Mj(c). To use CR as a measure of performance meant to 
establish (experimentally), that some sort of vector optimisation problem can be well 
defined for any method and any fimction F(x). Hence, the performance of some search 



method Mj(c) is evaluated W the optimal run time RT* to solve (1); 
RT* = min RT s.t. c€Pjc3t"^' for each Mj(c) ; c* = arg min RT 



( 6 ) 



The problem of comparing different search methods will then be decided 
a) numerically, by RT*, NX(c*), NF(c*),... and b) by evaluation of the practical 
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applicability of Mj(c) to solve (1), when their parameter vectors c are changed. 
Absolute times measured with one computer are required for a). Relative times RT are 
needed to determine c* and the structure of the RT(c) curve. Comparisons of different 
Mj(c) on other con^uters can be done by considering their relative RT(c) relationships. 
Of course, to apply RT as a measure of performance of some search method in com- 
parison to others entirely depends on the reproducibility of RT by the computer in use. 

U. Notes on Data Collection 

1J.1. Means and Standard Errors: Choice of the Number NRUN of Repetitions 

After NRUN optimisations to solve (1) means p and their standard errors o were 
calculated for each component of criterion vector CR. When the search was 
deterministic, o was set equal to 0 in time independent components of CR. 

Commonly, averaging takes place with data from “successful” runs only, i.e. data from 
runs that exceeded time or iteration limits are omitted. Hence, p and o, -e.g. for RT-, 
are determined too small. Experimental RT(c) data demonstrated [2] that RT flattened 
out even if less than 10 from 1000 runs were omitted. Therefore the desired accuracy to 
obtain the RT(c) function demanded in present investigation all runs to be “successfol”. 
Consequently, throughout this study, the stop limits were adequately increased. 

The number of repetitions NRUN was determined experimentally for each search 
method for accurate localisation of min RT(c). In particular results from random search 
methods showed that narrow minima can easily be overlooked for NRUN ^ 100 
because standard errors o of RT are too large. Also, means of random components of 
vector CR adopt asymptotically with NRUN a constant value: Experimentally, e.g. RT 
« const, for NRUN ^ 500 in this study. Hence, in R.S. methods optimisations were 
usually repeated 1000 to 30000 times for each value of c. Deterministic methods 
required substantially less repetitions due to the stability of the central processor time. 

1.3.2* Notes on the Choice of 8 in Convergence Criterion |F-F*|<e 

The total time was restricted to 150 min’s cpu time to run one program. Finally, 12 
search methods with their optimal parameters c* set were to be applied successsively 
to solve (1) for performance comparisons. The demand for accurate data required to 
have optimisation runs repeated 1000 times for each search method. However, each run 
should regularly terminate with entry into the feasible region |F-F*|<8, even for the 
slowest method. These requirements defined experimentally the lower limit of 8 for 
each test function, e.g. 8=10"^ for FI and 8 = 10"^ for F6 (see Ch. 1 .4). Also, all runs, 
which measured the RT(c) fiinction in order to determine the minimum position c*, had 
to use the same value 8, which was eventually employed in final comparative runs. 

L3.3. The Use of the Central Processor as a Stop Watch 

In this report the time is investigated that some search method needs to localize the 
minimiim of a test function. It had to be verified experimentally that this time is not 
falsified by undiscovered fluctuations and drifts of time that arise from the workload of 
the central processor in a job sharing environment. 

The “time stability” of the computer was investigated in separate programs. The 
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numerical results, see [2], of time stability investigations recorded throughout ap- 
proximately 70 months demonstrate; The computer had time stable states which lasted 
sufQciently long for time measurements reported here in order to achieve any degree of 
accuracy which was practically demanded. Time data RT(c) were reproducible on any 
day of measurement. As the optimisation time and the scatter from one program were 
recorded day by day any change in time stability of the central processor was reliably 
observed. Thus fmal data were obtained but from a time stable central processor only. 

1.4. Relationship Between Run Time and Parameters of Search Methods 

In order to determine numerically the relationship between the run time RT and the 
parameters of a given optimisation method two test functions [6] were used; 

FI Rosenbrock Function, e = 10“^ 

F(x)=100(x,^-x/+(l-x,)^ (7) 

x"=(-l-2, 1.0) F(x*0 = 24.2 x*’=(l.l) F* = 0 

F6 Engvall Function, 6=10"^ 

F^) = x,'* + X* 2X|^2^ - 4x, + 3 (8) 

x"= (0.5, 2.0) F(x“) = 19.06 x*’ = (1, 0) F* = 0 
The convergence limit c was kept constant throughout this study. It’s value was 
determined for each test function separately (see Ch. 1.3.3 and 4.1) to enable fast 
accurate data collection for all search methods within practical time limits set by the 
university’s computer centre (9000 sec per job). Hence e differed for FI and F6. The 
starting vectors for FI and F6 remained unchanged, apart from one independent study 
of their influences (see Ch. 4.2) on optimisation data. 

1.5. Search Methods 

The iteration process to optimize F(x) is defined by the search method selected from a 
pool of 12 available deterministic and random search methods: 

8 with single, 4 with more than 2 components of their parameter vectors. Table 1.1 
summarizes the investigated search methods. M8, M9, MIO are 3, 4, 5 parameters 
random search methods with modified increments Ax, as it’s components are not 
normalized. Though, the popular names are kept for ease of reference throughout. A 
brief description will be given; parameter names mentioned here are used to label the 
axes in graphical data representations. 

M8: FSSRS: Fixed Step Size Random Search [13.171 

Starting from a previous best vector XOLD a number of “NXOLD” fiirther reference 
vectors are consecutively generated. From each one progress is attempted into 
“NSTEP” random trial directions Ax with different step lengths; components of Ax are 
N(0,c) distributed. In FSSRS with reversing (“NSTEP < 0”) their reversed directions - 
Ax are tested, too: obviously, a second x value is obtained deterministically, available 
for comparison, without a time consuming generation of random numbers. The iteration 
continues into the direction of steepest descent to obtain the next reference point. After 
generation of this starlike search pattern the convergence criterion is examined. 

M9: RANPAT Randomized Pattern Search f7.171 

Additionally to FSSRS in each search star an extended step x - BEF*(XBEST - x) into 
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Table 1.1. Search Methods 



Deterministic Methods 
Stochastic Methods 



+ Ax‘‘ 






X ^+Ax ^ if F(x^*')<F(x ^ 
x^ if F(x*^"^^)^F(x^ 



SymboX 


Type 


Ax -(Ax, . . . . Ax^ . . Ax*") ' 
1 In 


M 7 


Stochastic 


c^*2(w), Zj^(w); Uniform Distrlb. on [-1.1] 


H 1 


Stochastic 


c^*Z(u), N(0,1) Distrlb. 


M 2 


Stochastic 


c^.Z(«), Zj(«): N(0,l||-|) Distrlb. 


M 3 


Stochastic 


2 

c^.Z(«), Z^(«); N(0,|||y^|) Distrlb. 

1 ax^ 


M 4 


Deterministic 


V 

- c^«VF(x ) Gradient Method: Const. Step Width 


M 5 
K 6 


Deterministic 

Deterministic 


,8F 

- (^/ — 2^ Quasi Newton H.A<( — ^)) 

i 8x^ 8x^ 

2 

- c • (^-^) VF Newton-Method 

V « A 

dx 


H 8 


Stochastic 


3 Parameter fixed ftep fize fandom fearch, FSSRS 


M 9 


Stochastic 


4 Parameter Random Pattern Search, RANPAT 


M 10 


Stochastic 


5 Parameter Adaptive ftep fize fandom fearch, ASSRS 


M 11 


Deterministic 


- ^*VF(x^) Gradient Method: Variable Step Width 


M 12 


Deterministic 


- cj^p^VF(x^) p*- arg min F[x^-^VF(x^) ] 






(Line Search) 



the direction of steepest descent (XBEST - x) is done to check a possible fiirther 
decrease of F(x). Hence, in RANPAT a 4th parameter, an accelerator BEF > 1 is 
implemented. Again: for this step no use is made of the random number generator. 
MIO: ASSRS Adaptive Stepsize Random Search FLl 1.12.16] 

In principle ASSRS uses a local “radius r” which - in the modified version used here- 
refers to the N(0,r) distribution of the components of Ax. In case of success radius r is 
increased by means of an accelerator BEF > 1 : r: = r • BEF. In case of failure for 
KSTEUR ^ 1 consecutive sample points radius r is decreased by means of a retardation 
factor VERF < 1 : r: = r • VERF. Consequently, - in some simple version [11] - 
ASSRS has at least 5 parameters to set before the iteration process can start: 

Cq: the initial “radius”, NXOLD, KSTEUR, BEF, VERF. 
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1.6. Note on Random Numbers 

Random numbers fen* R.S. methods were provided by the UniBw - Unisys A 1 5 random 
number generates. It generates uniformly distributed random numbers on [0,1], For M7 
a uniform distribution on [-c, c] is easily derived. R.S. methods apart ^m M7 used 
N(0,1) distributed random numb^, which were obtained according to Box-Mailer [3]. 
If X, = (-21nZ,)*'^cos2KZ2. Xj = (-2 In Z,)’^ sin 2n Zj (9) 

and Z], Z 2 are independent and uniform on [0,1], then X], X 2 are N(0,1) distributed. 
For R.S. methods many components of the criterion vector CR (see Ch. 1 .2), -e.g. the 
optimisation time- are averages from successful optimisation runs. The random num- 
bers in each of NRUN (see Ch. 1.3.1) repeated optimisation runs were different. 

2* Sensitivity of Search Methods to Changes of their Parameter Vectors 

2.1* Run Times as a Function of Parameter Vector c 

The optimisation problem (1) was solved with each method Mj(c), i=l,2,...12 sep- 
arately for the Rosenbrock function FI and the Engvall function F6, see Ch. 1 .4. Opti- 
misation times RT(c) were measured at gradually closer spaced parameter vectors c, 

- so to cover the entire permissible parameter space Pj of method Mj(c). In general: for 
different methods RT(c) depends differently on c. This result refers to the kind of struc- 
ture, the position of structure within c-space and to convergence or to non-convergence. 
Also, RT(c) depends on the test function in ( 1 ). For any search method and for any test 
function the dependence of RT on c is not negligible. When experimentally required, 
any optimisation time RT ^ RT* could be obtained by properly adjusting parameter c. 

2.2. Single Parameter Random Search Methods M7, Ml, M2, M3 

The shape of the RT(c) function with c=Co» is parabola like, see Fig.s 2. 1 , 2.2, as well 
for methods without as with derivatives. The RT(cq) branch for small values Cq is 
steeper than the branch for Cq beyond the minimum position. RT(co) has little to no 
structure ^art from some irregularities of the order of a few standard errors within the 
minimum region. In particular, for uniformly and normally distributed random variables 
Z^(o)), i=l ,2,...n in M7 and Ml (see table 1 . 1 in Ch. 1 .5.) the RT(Cq) functions look very 
similar, see Fig. 2. 1 . and Fig. 2.2. Essentially, RT(Cq) for the Rosenbrock fimetion FI 
and for the Engvall function don’t differ in shape. 

2J* Single Parameter Deterministic Search Methods M4, MS, M6 

For different deterministic methods M4, M5 and M6 (see Table 1.1. in Ch. 1.4.) the 
RT(cq) functions were different in shape and in structure, see Fig.s. 2.3a, 2.3b, 2.4. 
They depended considerably on the test functions. 

For the gradient method M4 with constant stepsize function values F(x) suddenly ex- 
ceeded any limit during iteration (“exponent overflow”), once some critical parameter 
Cr was passed. Also, at isolated c-values beyond Cr optimisation times RT(cq) became 
very large unless the runs were stopped, see Fig. 2.3a for FI . RT(Cq) for F6 had several 
local minima for Cq < Cr, see Fig. 2.3b. When deterministic method M5 and M6 with 
second derivatives were applied to solve (1) for test fimetion F6 similar RT(Cq) 
resulted: RT(cq) had some relative minima and a sharp rise beyond a parabola type 




fio^cuc of ihc Ccmpuler ^pecd) 




RADIUS 
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global minimum. However, when M5 solved (1) for FI the RT(cq) fimction was 
heavily structured with single abnormal time values within an otherwise decreasing 
trend curve till some critical parameter Cr, see Fig. 2.4. The severe structure below the 
critical value Cr became apparent only when the step width Acq in the parameter space 
was sufficiently reduced. If Acq is too coarse, relative minima in narrow valleys are 
overlooked. There might be also structure in RT(Cq) for F6 for smaller Acq Because of 
this difficulty the minimum RT* and Cq* were intentionally obtained from the trend of 
the RT(c) curve, as a replacement of the indeterminable global minimum. 

2*4* Single Parameter Deterministic Search Method Mil 

Deterministic gradient method Ml 1 uses - as in theoretical studies on convergence - 
variable stepsize Co/k where k denotes the iteration number, (table 1 . 1 in Ch. 1.5.). 

Table 2.1. Dependence of F(x) for FI on Number of Iterations k^^ and on Time^^ 

Ml l:x^^'=x^-(Co/k)-W(x^);k = iteration; Cq = 0.006946^^ F(x°) = 24.2 F(x*) = 0 



k« 


F(x) 


Time [msec] 


k« 


F(x) 


Time [msec] 


2 


9.915 


0.5 


462 


4.633 X 10** 


50 


5 


4.801 X 10-* 


1 


927 


4.607x10* 


100 


43 


4.721 X 10-* 


5 


4,635 


4.548 X 10** 


500 


87 


4.695x10** 


10 


1,296,684 


4.349x10** 


140,000 


225 


4.659x10** 


25 


9,622,299 


4.283 X 10** 


980,000 



Notes: 1 ) The time stop was set to terminate the optimisation, and iteration number k 
and fimction value F(x) were recorded. 

2) F(x^ was reduced at most - in given iteration times - with this choice of Cq. 



Experimentally, optimisation times RT(Cq) are large and for some e may practically 
impede to achieve convergence. E.g. for F6 optimisation times for convergent runs are 
considerably increased compared to those of M4 with constant step width, see Table 
3.3. For the Rosenbrock fimction FI iterations got stuck before they reached the 
feasible region with e = 10"^, - no matter how much time was made available for the 
iteration process, see Table 2. 1 . In order to achieve convergence for final data e had to 
be increased to e = 0.05. If Ml 1 is practically convergent at all convergence takes place 
within a very narrow range of Cq values, which easily can be overlooked. 

2.S. Many Parameter Deterministic Gradient Method M12 

Ml 2 with optimal step width p*, see Table 1.1, requires to activate a second search 
method in each iteration k to solve a separate optimisation problem (10), with 
additional parameters p to be set in advance, e.g. some convergence parameter e,. 
x^^' = x^ - Cop*VF(x^) Gk(p):= F[x^ - pW(x^)] 

p* = argminG|j(p) G* = min G,j(p) s.t. peS (10) 

Sj(p) = search method to solve (10) with unknown parameter vector p to optimise. 
Over - or imder estimation of p* during the main iteration process denoted by index k 
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will take place. Correction factor Cq compensates the average over - or under estimate; 
Cohas to be optimised for each function F(x). 

RT(cq) has an evolved structure which is discovered only when the grid width Acq for 
Cq is sufficiently reduced [2]. Observations show that the RT(cq) function strongly de- 
pends on termination limit c,[2]. If e, is small, - e.g. < 10’^ for FI with e = 10“^ or 
8s ^ lO’* for F6 with 8 = lO"® - RT(Cq) may have a pronounced minimum RT*(co*,8^). 
For small increases of Cq above Cq* the time RT may rise beyond all limits. If 8^ is too 
large Ml 2 may diverge: either by too large fimction values or by not reaching the 
feasibility region |F - F*| < 8 no matter how much time is permitted for the search. This 
may occur at isolated values of the correction factor Cq or in intervals for Cq. 

To determine RT(cq) for some fimction and 8 in dependence of all variables in the line 
search routine S 2 (p) is very difficult. Much time is needed to optimize RT(Cq) or to im- 
prove the performance of M12 in solving (1). This is a great disadvantage of the opti- 
mal gradient method Ml 2 compared with gradient method M4 with constant stepwidth. 
Additionly, M4 was much faster than Ml 2 (see Table 3.3). 

2.6. Many Parameter Random Search Methods 

2,6,1. R-S. Method M8: FSSRS 

FSSRS (see Ch. 1 .5) essentially selects in each iteration k the best fimction value fi*om 
a sample of NXOLD • NSTEP values, before it proceeds to the next iteration k+1 .The 
run time function RT (Cq) has the parabolalike shape known fi*om Ml . It’s minimum 
position for FI and F6 is almost independent of NXOLD and NSTEP. RT for given 
NSTEP and Cq rises with NXOLD - 0 and NXOLD - «>, with some wide ranged mini- 
mum in between, see Fig. 2.5. The optimal value of NXOLD depends on the test fimc- 
tion. Most important is the dependence of run time RT on the number NSTEP of trial 
directions from each reference point, see Fig. 2.6. For FI and F6 time is wasted when 
several descent directions are tested. Rather one random direction and it’s (deter- 
ministic) reverse (NSTEP < 0, see Ch. 1 .5) shall be examined before progress to the 
next reference point. The result “NSTEP = - 1 is optimal” is likely to be modified for 
fimctions f(x), xe3l” and n>2. In essence, FSSRS is faster than R.S. method Ml . 

2.6,2 R.S. Method M9: RANPAT 

RANPAT (see Ch. 1 .5) tested here is an improved FSSRS method with some deter- 
ministically defined, extended step fi*om each reference point into the direction of 
steepest descent. Qualitatively, for FI and F6, the shape of the RT(Cq), RT(NX0LD), 
RT(NSTEP) dependencies - with the remaining parameters fixed - are similar in 
RANPAT and FSSRS. However, for both test fimctions the position Cq* (BEF) of the 
minimum of the RT(cq) fimction, see Fig. 2.7, and it’s range as well as RT*, depend 
considerably on accelerator BEF >1 (here NXOLD and NSTEP are kept constant), see 
Fig. 2.8 and Fig. 2.9 for Co*-range and RT* as fimctions of BEF. When BEF increases 
the minimum position of the RT(Cq) fimction moves to smaller values Cq*(BEF), see 
Fig. 2.7 and 2.8, so to compensate the reduction of the N(0, Cq) distributed increments 
Ax. If BEF becomes too large, - e.g. BEF ^ 150 for FI and BEF ^ 600 for F6, - 
fimction values at iteration points wiA deterministic increments BEF • Ax become too 
large and are calculated in vain: run times increase noticeably, and a second minimum 





rifc 2S RT = RTfc.. N^OT.n. NSTEP^ for R J3. Method M8 - FSSR.^ 



mfE/RUNtrnsw] 



ind Fuiiclifln FJ: Pewadenge pfRT OitNXOLD c rS.ISOl 
K(0 h 1> Distiibulion of Rjndom No'i; NLIMIT = lOOOO; TUMIT - 2 sw. 



RT - RT(„.,WXOLD,...> ?tnmt\cr. 

METliOD = FSSRS - 0.040 

FUKCnON- Ff NST1LP-- 

[F-F*]<e = 10^ 



RT - RT(...,STEP) 
METHOD = FSSR 



FUNCTION = 1 



[F-F*[<f - 10- 
NRUN - ItX 



NXOLD = 25 



NSTEP 



anerges at some position close to that of method FSSRS without any acceleration, see 
Fig s 2.8 and 2.9. RANPAT is faster than FSSRS, if BEF is suitably chosen. 

2.6.3. R.S. Method M 10: ASSRS 

In some program mode the 5 oHnponents of parameter vector c in ASSRS (see Ch. 
1 .5), were changed simultaneously: Co63f", NXOLD ^ 1 , BEF > 1 , KSTEUR ^ 1 , 










261 



Fig. 2J RT = Rm. BEF. NXOLD. NSTEP^ for R.S. Method M9 = RANPAT 




Parameter: 



RANPAT 



METHOD 



FUNCTION- FI 



NXOLD - 25 



NSTEP-l 



TIME/RUN 3nd Function Fl: Dep<?ndcncc of RT on c^: Parameter: BEF 
[msec] -r N(0,1) Distribution of Random No's; NUMTI - 10000; TUM[T = 2 sec. 



m 



0.01 0.02 



0.03 





Jjg > 2*S»2.> Dfpcndmcf of OnUmal Time RT* mnd Range of Minimum Fcwilion g fc , , c^1 on Pjimmctpr BKF g fr200(M 
Tor R.S, Method M9 = RANPAT ind FI vyi(h RT-= RTfe^, TIKF. NXOI.D = 25. NSTEP ^ 1> and RT* ^ min KTfc,.„.> i.t. c, 
N(0, 1 > Distribution of Random hto*s; [ndcpcndcnt c = 10^ NUMTT = 1 0000; TUMIT = 2 see.; NRUK - IQOO 
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0 < VERF < 1 . Measurements of the optimisation time RT(c) demonstrated for each 
test function FI and F6 (Ch. 1 .4)-, that the initial parameter space could be reduced. 
RT(cq) was nearly independent of Cq in [10’*^ , 10®] for FI and in [0.01, 10.0] for F6. 
(RT increased by only 25 % of RT* for F6, when Cq 6 [lO"^, 10^]). RT(NXOLD) was 
similar in ASSRS as in FSSRS and in RANPAT. In essence, RT depended critically on 
the triple BEF, KSTEUR, VERF, see Fig.s 2. 10 - 2. 12, in a different way for FI and for 
F6. Pereas for F6 the local radius should be reduced each time after failure, i.e. 
KSTEUR s 1, fimctionFl requires only 1 ^ KSTEUR 10 if VERF is optimal for FI, 
see Fig. 2. 10. If VERF had been chosen badly, KSTEUR 6 (30, 70) would be a better 
choice for FI at the expense of optimisation time. 

These few reported observations indicate, that normally an improper parameter vector 
c will be chosen for ASSRS, which slows down this method in solving (1),- unless the 
global minimum of RT s.t cgPiq has been carefully located. E.g. for FI RT(c) had been 
measured at some 10,000 values of vector c. In the present form [1 1], see Ch. 1.5, 
ASSRS is not well suited to solve (1), but it can easily be improved [2]. 

In general, a practical limit to the number of components of parameter vector c in any 
search method appears to be 5. This limit is suggested from the difficulties in finding 
the global minimum of RT(c) in M 12, in ASSRS and in RANPAT. 

3. Numericals Results 

3.1. Optimal Parameters of Search Methods 

According to (6) RT* is the global minimum of RT(c) on parameter space P-cSg"’*'"^ of 
search method Mj(c), i=l,2,...in (2). 

Experimentally, RT* is not well defined and must be replaced [2] by the global 
minimum of a general trend curve T(c), i.e. RT*:= min T(c) s.t. cePj , i=l,2,...12. 

R.S. methods: T(c) approximated with least curvature the randomly scattered ex- 
perimental time data RT(c)±o(c). In many parameter R.S. methods this approximation 
was repeated alike for each component of vector c. 

Deterministic methods: The experimental scatter of the stochastic time variable RT(c) 
was but 1 % even at the least run times of this study, RT « 0.7 msec. Here the structured 
experimental RT(c) data are approximated by T(c), which ignores possible narrow 
relative time minima and resonances at isolated c-values. 

Tables 3.1 and 3.2 list the parameter subspaces Pj* i=l,2,...12 of the observed 
minimum position RT* for the Rosenbrock-(Fl) and Engvall function (F6) 
respectively. The minimum of optimisation times on Pj* was determined with 
independent high accuracy runs with very small step widths Ac. The last column of 
Tables 3.1 and 3.2 lists Cj* which corresponds to the minimum; Cj* is not unique. 
Tables 3. 1 and 3 .2 demonstrate that for one fiinction the same components of the para- 
meter vector, e.g. radius Cq^ differ for different methods. Also, optimal parameters for 
one search method applied to two test functions FI and F6 are different. 

3.2. Optimal Run Times RT* and Number of Function Values NF* 

Table 3.3 lists the optimal run times RT* and the optimal number of evaluated function 
values NF* at c* of Tables 3. 1 and 3.2 for FI and F6. 

Table 3.4 shows the influence of derivatives on R.S. methods in solving (1): 
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Table 3.3 Optimal Run Times RT* and No. of Function Values NF* 



Function 


FI; Rosenbrock e = 10*^ 


F6: Engvall e 


= lo-' 


Method 


Symbol 


RT* [msec]’' 


NF* 


RT*{msec]''’ 


NF* 


Uniform Distiib. 


M7 


122.7 ± 1.1 


1268 ± 12 


134.1 ±1.8 


1354 ± 19 


N(0,1) 


Ml 


251.9 ±2.4 


1336 ±13 


222.5 ± 2.5 


1,132 ±14 


n(o, 


51 ) 


M2 


484 ± 14 


2384 ±70 


9.187 ±0.087 


44.4 ±0.4 


n(o, 


€/^|) 


M3 


207.0 ± 4.3 


1000 ±14 


8.447 ± 0.091 


38.3 ±0.4 


VF 




M4 


469.15 ± 0.70 


4483 


0.7283±0.0050 




w /i!E 


M5 


387.47 ±0.66 


3014 


0.7888±0.0034 




H-‘VF 


M6 


2.694 ±0.02 


8 


2.013 ± 0.019 




FSSRS 


M8 


143.3 ± 1.4 


1336 ±13 


130.1 ± 1.5 


1149 ± 15 


RANPAT 


M9 


83.55 ± 0.76 


809.1 ±7.3 


21.81 ±0.17 


214.6 ± 1.7 


ASSRS 


MIO 


134.6 ± 1.9 


1151 ±17 


16.61±0.14 


141.5 ± 1.2 


fVF 




MlP' 


((0.5318 ± 0.0074 6))‘> 


28,639 ±44’> 


264,411 


Erx""- 


pVF(x>')l 


M12'> 


1,938.1 ± 1.6 


22,029 


15.808 ± 0.031 


184 


->Mm. 













Notes: 1) Ml 1 did not converge for FI with c * 10“^. Data were obtained with c = 0.05 

2) Data for Ml, M2, ..^10 were obtained in one program for FI and for F6; 

Data for Ml 1, M12 for FI, F6 were collected in 4 separate programs 

3) Mean values were determined from 1000 optimisation runs apart from method Ml 1 
for F6, which used 300 runs only. 

4) Quoted errors are standard errors for R.S. methods and standard deviations for 
deterministic methods, which takes into account the experimental results on time 
stability investigations of the computer, see ^2] 



Optimal Run Times Fmsecl for Single Parameter Search Methods. 
Table 3.4 : Stochastic Search Methods without and with Derivatives 



Symbol 


Ml 


M2 


1 M3 


DistiibutiQn: 
see Table 1.1 


N(0,1) 


N(0,|f|) 




FI: Rosenbrock 


252 ±3 


484 ±14 


207 ± 5 msec 


F6; Engvall 


223 ±3 


9.2 ±0.1 


8.5 ± 0.1 msec 



Table 3.5 : Deterministic Search Methods with Derivatives 



Symbol 


M4 


M5 


M6 


Iteration: 
see Table 1.1 


VF 


SF 1 


Newton: H*' • VF 


FI 


469 


387 


2.7 msec 


F6 


0.73 


0.79 


2.0 msec 
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for F6: the search becomes faster: RT*(M3) < RT*(M2) < RT*(M1 ); 
for FI : the search becomes slower: RT*(M2) > RT*(M1), 
or it becomes faster: RT*(M3) < RT*(M1 ). 

^parently the introduction of a gradient into stochastic search methods does not neces- 
sarily speed up the search for all hmctions. 

Table 3.5. shows the influence of 1*^ and 2*^ derivatives on deterministic methods 
for FI : the search becomes faster: RT*(M6) < RT*(M5) < RT*(M4); 
for F6: the search becomes slower: RT*(M6) > RT^CMS) > RT*(M4) 

I.e. the Newton method M6 was slower than the simple gradient method M4 for F6. 
Deterministic gradient methods Ml 1 and Ml 2 are slower than M4, M5, M6; Ml 1 
with step width 1/k is the slowest of all methods in this study. 

Noteworthy, too: for FI stochastic methods Ml and M3 in Table 3.4 are faster than 
deterministic methods M4 and M5 in Table 3.5: 

forFl: RT*(M3) <RT*(M1) <RT*(M5) <RT*(M4), andRT*(M2) « RT^(M4) 
Hence “more information”, - i.e. the use of derivatives - did not accelerate the search 
in any case. Also, deterministic methods were not always faster than R.S. methods. 

Table 3.6 Optimal Run Times fmseci for Many Parameter Random Search Methods 



Method 


Ml 


FSSRS 


RANPAT 


ASSRS 


FI: Roscnbrock 


252 ±3 


143 ±2 


84 ±1 


135 ± 2 msec 


F6:Engvall 


223 ±3 


130 ±2 


22±1 


17 ± l.msec 



Numerical results in Table 3.6 for many parameter R.S. methods show: 

(a) RT* is improved if compared with single parameter method Ml for FI and F6. 

(b) The introduction of an accelerator factor (“BEF”) in RANPAT accelerates the 
search compared to FSSRS (fixed stepsize) for FI and for F6. 

(c) A retardation factor to optimally control the stepsize in ASSRS accelerates the 
search for F6 but it slows down the search for F 1 . 

Numerical results for M7, with a uniform distribution of random numbers on [-c,c] are: 

a) M7 is faster than Ml which uses an N(0,c) distribution of random numbers. 

b) M7 is slower than RANPAT. c) M7 is for FI faster, and for F6 slower than R.S. 
methods M2,M3,M8,M10. 

33 . Time Optimal Components NFDOWN and NDEMI of Criterion Vector CR 

Components NFDOWN and NDEMI of criterion vector CR coarsely describe the 
progress of iterations relative to NX, the number of generated x-vectors: 

NFDOWN = number of decreasing function values F(x*"^*) < FCx’^), k = iteration index 
NDEMI = number of x-vectors required to pass F^/ 2 - this number is stored, 

if < F'' and F‘‘^' < F ,/2 = F* + 0.5- [F(x“)-F*] 

NFDOWN reveals one major difference between R.S. and deterministic methods: 

R.S. methods more often fail to make progress on iteration; for numerical details see [2] 
R.S. methods generate decreasing fimction values with ratios NFDOWN/NX of only 
10 % or less for FI and of about 30 % for F6. Contrarily, deterministic methods 
generate mostly decreasing fimction values with every new iteration vector. 

NDEMI, the number of iterated x-vectors to reach Fj /2 is a rough indicator for a search 
method to leave the start position x^. 

RS. methods: The number of half way iterations NDEMI to arrive at Fy 2 was different 
for test function FI and F6, see Table 3.7. 
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For FI methods M2, M3 (with derivatives) need an average of 8.5 % and 9 % of all 
iterations in the beginning to decrease F(x) from F(x^ to F(x)<Fj/ 2 . Other single and 
many parameter R.S. methods need at most 6 iterations, which corresponds to less than 
0.55 %, to pass the Fiy 2 mark. For F6 between 5 % and 10 % of all iterations reached 
F 1/2 after start from x . 

Table 3.7 NDEMI/NX in % for FI and F6 with Time Optimal Parameters in Ml -Ml 2 



Method 


M7 


Ml 


M2 


M3 


M4 


M5 


M6 


M8 


M9 


MIO 


Mil 


M 12 


FI 


























F6 


9.14 


10.3 


10.0 


8.3 


14.3 


16.7 


20.0 


10.1 


7.8 









Deterministic methods reached F 1/2 of test function FI after 1 till 4 iterations and F 1/2 of 
F6 after the first iteration, which corresponds to NDEMI/NX 20 %. 

Summarily, the convergence rate of all investigated search methods is for both test 
fimctions initially very large; it is largest for deterministic methods. 

3.4. Practical Convergence Speed 

The number of iterations and/or the times required to pass preset function marks on the 
path of iteration was used as a practical measure of convergence speed. The search 
methods were applied with their time optimal c*, see Tables 3.1, 12. The distance 
between F(x^ at start vector x® and F* was subdivided into 1 1 intervals JS: 

JS = 0: F* ^ F < F* + 8; 8 = 10"^ for FI (Rosenbrock); 8=10"^ for F6 (Engvall) 
JS=1, 2,...10: F*+8 + (JS-1)AF^F<F + 8 + JS-AF,AF = 0.1 -(F(x®)-F*) 
During each run of -in general- 1000 runs iterated ftmction values were sorted into the 
correct interval JS and the corresponding iteration number and run time were stored. 
Finally, the averages of the (iteration, run time)-tupels and their standard deviations 
were determined for each interval JS. Due to the storage and averaging process the 
increase in time ranged from some 10 % to several 100 %. 

As an example only , see [2], Table 3.8 lists the distribution of mean run times with 
their standard errors versus Action interval JS for function F6. 

Most of the function decrease takes place during the first iterations, which very fast 
covered half the distance from F(x) at start x® to F*. This statement has been secured 
reliably with measurements of the actual distribution of ftmction values to determine the 
practical convergence speed of a search method. Both deterministic and R.S. methods 
used up most of the optimisation time and needed most of the iterations shortly before 
they were successful to enter the feasible region from near by: Convergence speed data 
showed a jump discontinuity in the number of iterated function values moving from the 
last 2 ftmction intervals into the 8 - environment of F*. Additionly to convergence speed 
results of intermediate output of iterated x-vectors and individual ftmction values 
provide strong support to reduce the local radius Cq on approach of the minimum during 
iterations. In this way the probability is increased to enter the feasibility region on 
approach. A measure for radius decrease may be derived from the history of the 
iteration process: When the rate of ftmction improvements reduces with an increase of 
iterations, the radius should decrease. 
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Table 3.8 Mean Time [msec] Required to Reach Function Interval JS for F6 

JS: Definition see Ch. 3.4; NRUN = 1000, apart from Ml 1 : NRUN = 1 

2 ) 

■ • I I I I I I I I I 'I 



JS 


M7 


Ml 


M2 


M3 


M4 


M5 


M6 


M8 


M9 


MIO 


Mil 


M12 


10 


7.79 

* 0.03 


8.70 

± 0.04 


1.31 

± 0.35 


0.66 

± 0.14 








7.95 

± 0.04 


3.80 

± 0.07 


2.48 

± 0.21 






9 


14.37 

± 0.02 


16.79 

± 0.04 


1.36 

± 0.32 


1.20 

± 0.21 








14.51 

± 0.03 


7 . 56 ± 

0.026 


3.64 

± 0.26 






8 


21.11 

± 0.03 


25.06 

± 0.04 


1.92 

± 0.36 


1.52 

± 0.24 








21.15 

± 0.03 


8.35 

± 0.03 


4.28 

± 0.27 






7 


28.70 

± 0.03 


34.41 

± 0.04 


1.39 

± 0.34 


2.20 

± 0.29 








28.61 

± 0.03 


9.42 

± 0.04 


4.86 

.± 0.25 






6 


37.39 

± 0.03 


45.20 

± 0.05 


3.10 

± 0.47 


2.97 

± 0.26 








37.18 

± 0.04 


10.55 

± 0.04 


5.23 

± 0.24 






5 


47.58 

± 0.04 


58.04 

± 0.05 


2.44 

± 0.31 


3.47 

± 0.25 


0.13 

± 0.00 






47.32 

± 0.04 


11.93 

± 0.04 


5.58 

± 0.22 






4. 


60.17 

± 0.04 


73.72 

± 0.05 


3.36 

± 0.28 


4.61 

± 0.20 








59.77 

± 0.04 


13.71 

± 0.04 


. 7.00 

± 0.16 






3 


77.06 

± 0.04 


94.52 

± 0.06 


4.25 

± 0.22 


5.44 

± 0.16 




0.16 

± 0.00 


0.44 

± 0.00 


76.40 

± 0.04 


16.16 

± 0.04 


8.04 

± 0.13 






2 


104.32 

± 0.05 


127.87 

± 0.07 


5.89 

± 0.13 


7.17 

± 0.09 








102.87 

± 0.05 


20.05 

± 0.04 


9.07 

± 0.09 


0.35 




1 


174.44 

± 0.08 


213.25 

± 0.10 


14.46 

± 0.04 


14.50 

± 0.05 


7.06 

± 0.01 


6.98 

± 0.01 


7.60 

± 0.01 


170.10 

± 0.08 


36.40 

± 0.05 


21.63 

± 0.06 


246.5 
1 ± 4.2 


12.64 

± 0.09 




32.63 


40.44 


5.20 


4.86 


0.64 


0.55 


0.62 


ai .28 


10.25 


8.98 


133.6 


0.25 


0 


■ 

284.62 

± 1.86 


359.37 

± 2.58 


21.32 

± 0.13 


20.44 

± 0.13 


8.43 

± 0.01 


8.23 

± 0.01 


9.13 

± 0.01 


267.0 

± 1.5 


55.88 

± 0.26 


33.48 

± 0.13 




22.37 

± 0.04 



1) Line to demonstrate qualitatively the magnitude of standard deviations of run times 
in the JS = 1 interval. 

2) Iteration times and their scattering were recorded only for iteration 1 (JS = 2) and 
iterations 2 till 1000 (JS = 1). 

3.5. Numerical Performance of Search Methods 

For the time optimal choice of parameter vector c in tables 3.1 and 3.2 values RT*, 
NF* were tabulated before in Table 3.3. Tables 3.9c and 3.9d list for both test ftinctions 
FI and F6 the sequence of search methods in the order of increasing optimal run times 
RT* and of number of calculated ftinction values NF*. 

The ordered set of methods with increasing RT* 

- differs for both functions FI and F6; 

- differs for one function from the ordered set of methods with rising NF*; 

- shows that for FI several R.S. methods are faster than deterministic methods; 

- shows that the fastest and the slowest (=M1 1) method is deterministic. 

For one test fiinction the order of search methods with respect to RT* will change, too, 
when 8 or the start vector x® changes, or when a different component of criterion vector 
CR (5) were optimized with regard to c. 
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Table 3.9 Set of Search Methods in the Order of Increasing Run Times RT 
or No^of Function Values NF 

Table 3.9a Search Methods: Symbols, see Table 1.1 in Ch. l.S 



M7 i 


Ml 


M2 


M3 


M4 


M5 


M6 






N(0.|f|) 




VF 




Newton- 

Method 


M8 


M9 


MIO 




MU 


M12 


FSSRS 


RANPAT 


ASSRS 




If^i 


F(x‘‘-pVF(x'‘))-»MIN 



Table 3.9b Function FI; Not Optimal Choice of Parameters in Search Methods 



Order 


1 


2 


3 


a 


5 


6 


B 


8 




10 


11 


12 


RT 


M2 


M4 


M5 


Ml 


M3 


M8 


MIO 






M6 


M12 


- 


NF 


M2 


M6 


Ml 


M3 


M5 






M7 


M9 


M12 


- 



Table 3.9c Function FI; Time Optimal Pnmtncters in Search Methods, see Table 3.1 



RT* 






M7 


MIO 
















(Mil) 


NF* 


M6 


M9 


M3 


MIO 










M5 


M4 




(Mil) 



Table 3.9d Function F6: Time Optimal Parnmeters in Search Methods^ see Tabic 3.2 



RT* 






M6 


M3 








M9 


M8 






Mil 


NF* 






M4 










M9 


Ml 






Mil 



Tables 3.9a-3.9d shall serve to demonstrate the importance of the use of optimal 
parameters in search methods when performance comparisons are made. 

For not time optimal parameter vectors in search method as in table 3 . 1 0 of [2] and for 
test ftinction FI optimisation times RT and number of calculated ftinction values NF 
were determined. The order of search methods with increasing run times in Table 3.9b 
is reversed to the order where parameter vectors are time optimal as in table 3.9c. 

To evaluate the performance of a search method also the expense of time to determine 
the optimal parameter vector c*, it’s complexity and applicability for practical 
problems must be taken into account. The need to determine c* suggests not to use 
many parameter methods if they have more than 5 parameters to adjust - even if they 
are fast. One may rather decide to resort to some mediod with less parameters although 
it may be slower. The “most efiScient” method for any given type of practical opti- 
misation problem must be determined experimentally. 
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4. Sensitivity of Search Methods to Parameters Other Than c; 

Search When the Optimum is Not Known 

4.1. Dependence of Optimal Variables RT*, NF*, c* on t 

Run times RT were measured dependent on c = Cq for FI and F6 (Ch. 1 .4) using R.S. 
methods M7 and Ml . Only t was changed and the minimum of RT(Cq, c) s.t. Cq was 
determined as before, see Ch. 2. 1 ; RT*, Cq* and NF* are time optimal. Representations 
of RT*, Cp*, NF* as a fimction of e e [10’*, 10^] in a double logarithmic plot were 
almost straight lines for both methods and for FI and F6, see Fig.s. 4. 1 and 4.2, 
logy = log a + b log c y = a • y: = RT*, NF*, Cq* 

The fiinctional relationship RT*(e), OT*(e), Cn*(e) is approximately 
RT*^f NF*~f Co* -Vi forM7andF6 

RT* ~ ^ NF* - ^ Co* ~ forM7 and FI 

3 . 75 /- 3 . 8 /- ^ ' 

^8 ^8 

The investigation shows that time optimal values Cq* of parameter vector c for any 
search method depend strongly on the e - environment of F*. For this reason RT(c,...) 
fiinctions were measured throughout this study with some constant value 8, to determine 
c*. If F* is not known the feasible region jF - F*| is not defined, and c* remains 
undefmable. Similarily, if x* is not known, the feasible region [x - x*| < 8^ and conse- 
quently some Cx* are not defined. If both x* and F* are unknown optimal parameter 
vectors with regard to some convergence criterion are difficult to suggest. 

4«2 Dependence of Minimum Range, c*, RT*, NF* on Start Vector 

The sensitivity of the RT(c) fimction to changes in the start vector x® has been 
investigated. To do so || x® || was increased by a factor of 100 to || x^|| = 100 • |1 x® ||. 
Additionly, for test fimction FI, 8 = 10^ and R.S. method Ml with c = Cq the RT (Cq ) 
fimction was determined at four different start vectors of the same lengths, see Fig.s. 4.3 
a-d, that form the comers of a rectangle. In each case the shape of RT(cq) is similar to 
RT(cq) for the original small x®. However the minimum position was shifted fi*om Cq * 
= 0.04 to about 0.1 1 or 0.30 depending on the components of the start vector. The 
optimal run times RT* and NF* were increased by approximate factors of 6 and 64. 
Next, the RT(Cq) fimction was determined for M2 and M4 which started the search 
fi'om x^ = 100 • X® = (50, 200) for F6 and 8=10'^. For M2 and M4 the minimum 
position was substantially decreased and optimisation times RT* and the number of 
calculated fimction values NF* were largely increased. The changes were about 
M2: Co*(x“’) « 1/266 Co*(x“) RT*(x®') « 370 RT*(x®) NF*(x^ « 370 NF*(x®) 
M4:co*(xV 1/8900 Co*(x®) RT*(x®') « 25000 RT*(x®) NF*(x'0 « 21,600 NF*(x°) 
Data obtained for FI with R.S. method Ml (with no derivatives) show that Cq* must 
increase when the start vector x® is far fi*om x*. In this way a large distance towards x* 
can be covered fast. However, when F(x) is close to F* the probability to enter the 8- 
environment reduces severely, and optimisation times must rise. This observation 
suggests to reduce CQfor larger iteration numbers. Also, reported data point out at the 
difficulty to choose parameter c in some search method if x* is unknown, as fi*om x® 
neither the distance nor the direction to x* are known. 
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4 J. Run Times RT(c) for Normalized Random Number Vectors 

How does the distributicm of random numbm in R.S. methods affect the search for the 
optimum of a function? Investigations repmled in Ch.2 have shown that the RT(Co) 
functions for method M7 (with uniform distribution) and for Ml (with normal 
tribution of random numbers) are both parabola like in shape, see Fig. 2.1 and 2.2. 
Their minima were at Cq* = 0.052 (M7) Co* = 0.040 (Ml) for test hmction FI and 

at Cq* = 0.012 (M7) and c^* = 0.0080 (Ml) fOT F6. Also, M7 was faster than Ml 
(Ch.3.2). Addition^, the influence of normalized randcxn number vectm^ in R.S. search 
methods mi optimisation times RT(co) has been investigated. These random number 

Fig 4.4 RT - RTrO for R.S. Methods Ml. M2 ■with Normalized Random Number 
Vectors ZfaV IIZ^wlll and RTro for Ml with Not Normalized TXa') 



Fuoctioii - FI; p ■ F*1 < « - KT*; NRUN - 100; 

No's Inserted Are Number of Runs Stopped with RT > TUMTI Among 100 Runs. 

These Runs are Not Counted to Determine the Mean Optimisation Time at Some Value c^. 
— 5 — Ml: - rf* + Co Z(cd)/llZ(<.))||; Z^«): N(0,1) Distributed; lUMET - 20 sec. 

— M2: j^^‘-}^ + e,Z(o)yilZ(o.)ll; Zf,u): N(0,|^|)Dist(ib.;TUMrr-505ec. 
--0 — M1:x!^‘“x!‘ + CoZ(ci>); Z,(c»)): N(0,1) Distributed; lUMIT “ 4 sec. 



TIME/RUN[mscc] 
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vectors are uniformly distributed on a sphere with radius Cq [1 8]. This type of distribu- 
tion has been suggested by several authors - e.g. [1,7,12,13,17]. As an example the 
RT(cQ)fimctionsforMl,M2 andFl are represented in Fig. 4.4. RT(Co)forM3 has not 
been included in Fig. 4.4 as RT(cq) would lie in between both RT(Cq) fonctions and ob- 
struct the readability of that figure. Comparison of the RT(cq) functions for Ml , and for 
M2 with RT(Co) for Ml , which uses not normalized random numbers, shows: 

For Cq ^ 0.26 RT(Co) is similar in shape with slightly increased run times; for Cq ^ 0.26 
RT(co) remains flat The position Cq* of min RT(Cq) s.t. Cq for Ml , M2, M3 for normal- 
ized random number vectors is approximately Cq* ~ 0.024, if runsets were considered 
that have only “successful” runs and no run stops due to non converging runs. Position 
Cq* ~ 0.024 differs from Cq* in Ml , M2, M3 of table 3.1 with not normalized random 
number vectors. Run times RT* and number of function values NF* are approximately 
larger by factors 1 .2 for Ml , 1 .6 for M2, and by 2.0 for M3. If random vectors have 
constant lengths larger than approximately Cq « 0.24, an increasing number of runs 
failed to enter the feasible region, as was qualitatively expected, see e.g. [1 1]. 

4.4. Change of Convergence Criterion from IF*^ - F*| < 8 to |F*‘'‘’^ - F*^ | < e 

4.4.1 Distance F - F* for New Convergence Criterion. 

If F* is known |F - F*j < 8 is a suitable convergence criterion for any method to stop 
the search for x*. Investigations reported before had the aim to make statements on the 
optimal parameter vector c* in 12 search methods: If x* were known an analogous 
research were to be carried out with |x-x*|<8x. If neither F* nor x* are known the 8- 
environment of F* qr of x* is not defined. The problem rises where to stop the search. 
Some convergence criterion must be applied that is related to the history of the search 
for the optimum, e.g. min{F(x*^), k=l,2,...n> s.t. k = l,2,...n or |F^'^’ - F^| < 8. 

If F*^^* <F^ and |F^ - F*| < c then follows |F - F*| < 8, but the reverse is not true! As 
an example the distance from the known optimum F* had been investigated using 
previous time optimal c*, when the search was terminated with: 

|I^'-F‘'|<c. (11) 

The distances are listed in Table 4.1: With (11) some deterministic methods still a- 
chieved convergence. However, most methods stopped far ahead of F* with the search. 

4.4.2 Dependence of F - F* at Stop as a Function of Sample Parameter NXOLD 
in Many Parameter R.S. Methods. 



The main disadvantage of single parameter search methods to approximate F* when 
(11) is used, is the examination of convergence criterion (1 1) every time after two 
successive iterations. Real search problems with unknown optimum (x*** J**) are best 
solved with many parameter methods that group the iterated ftinction values in sets of 
a sufficiently large number of NSET sample points. Then the minimum of two 
consecutive sets may be compared: 



F min (F(X|^j), F(Xj^ 2 )> -F(Xj^j^sET)^ 



= {F(x,,„F(x,,,),...F(x,^et)J 



- "IF*'*' - F‘'| < e ? 



( 12 ) 
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Table 4.1 Distance - F* when the Search is Terminated with < e 

Search Methods M l - M12 Use Optimal Parameter c* of T ables 3.1. 3.1 



*Function 


FI: Rosenbrock 


F6:EngvaH 


Method 




t 




lo-* 


10-* 


M7 


unifonn distiib. 


[1831.2 ± 219.0] X 10-^ 


[16,203 ±8091]x 10^ 


Ml 


N(0,1) 


[1435.1 ± 216.7] xW 


[13,994 ±10,912]xi0-‘ 


M2 


N| 


f 

0. 


tel) 


[ 303.9 ± 71.1] XTO** 


[30.8±18.5]xl0-* 


M3 


N 


0, 






[ 462.2 ± 96.5] xlO-* 


[2.34 dk0.33]x Iff* 


M4 


VF 




426.3 xlO^ 


0.29x10^ 


M5 




300.1 xlO^ 


0.000058 xlff* 


M6 


H-^VF(x) 


0.38x lir* 


0.0011 X Iff* 


MS 


FSSRS 


[7.73±1.43]xl0'‘ 


[1.496 ±0.094]xl0-* 


M9 


RANPAT 


[6.54±0.91]xl0‘‘ 


[0.488 ± 0.024]x Iff* 


MIO 


ASSRS 


[84.12 ± 43.89] xlO" 


[0.051 ±0.011]xl0-* 


Mil 


ivF 




491.5x10-' 


1216 xlff* 


M12 


Ffx*"- 


■pVF(x'01 


12814 xW 


0.020x10-* 











Table 4.2 Dependence of and of RT on NXOLD with Termination Criterion IF***-F*i < e : 

Test Function « FI: F* = 0. c = 10^ 

Notes: Between and F* 2* NXOLD sample points f^dierc generated; 

Search methods are: M8 - FSSRS, M9 * RANPAT, MIO » ASSRS 



Method NXOLD = 5 10 15_ 20 ^ 





[ M8 
M9 
[MIO 


563.5 ± 132.9 
236.9 ±71.4 
125.0 ±33.5 


77.4 ±39.0 

31.4 ±3.1 
84.9 ±18.7 


18.5±3.1 
18.1 ±2.4 
72.7± 19.5 


45.7 ±38.7 
8.1 ±0.8 

83.7 ±39.3 


7.7 ±1.4 
6.5 ±0.9 
84.1 ±43.9 


RunTime 

[msec] 


M8 

M9 

MIO 


179.2 ±6.1 
60.21 ±0.91 

89.2 ± 1.1 


182.1 ± 5.8 
68.1 ± 1.1 
95.5 ±1.3 


199.2 ±6.6 
74.2 ± 1.0 
97.0 ± 1.3 


207.9 ±6.9 
81.1 ±1.7 
102.1 ±1.4 


209.9 ± 6.9 
85.6 ± 1.8 
106.6 ± 1.6 






NXOLD *30 


40 


60 


80 


100 


iF^-'-F'l 

Z 


[ M8 
M9 
MIO 


9.1 ± 1.7 

4.2 ± 0.6 
30.3 ± 10.5 


3.2 ± 0.5 
1.6 ±0.2 

9.2 ± 1.1 


1.60 ± 0.27 
0.64 ±0.07 
4.30 ± 0.73 


0.77 ±0.06 
0.31 ± 0.03 
2.20 ± 0.52 


0.70 ±0.05 
0.19 ±0.02 
0.96 ±0.16 



M8 197.9 ±5.1 215.3 ± 6.4 226.9 ±5.9 258.4±7.7 258.9 ±7.2 

M9 90.6 ± 2.0 94.6±1.4 115.4±3.6 120.4±2.6 150.0±5.3 

Mio 112.4±1.8 115.8 ±1.7 131.4±1.9 145.5±1.9 157.7 ±Z0 
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To demonstrate the effect of the sample size, R.S. methods M8 (FSSRS), M9 
(RANPAT), MIO (ASSRS) have been run at their time optimal parameters of Tables 
3.1 and 3.2 apart from a varying sample size NSET, called NXOLD (see Ch. 1 .5), to 
solve (1) with (12). Table 4.2 lists the distance F - F* for test function FI as a frinction 
of NXOLD, when methods M8, M9, MIO stop the search because condition 
- F^l < 8 is met. Experimentally follows, the e-environment of F* is reached: 
for M8 with NXOLD ^ 80, for M9 with NXOLD ^ 60, for Ml 0 with NXOLD ^ 1 00. 

Method M8 M9 MIO 

table 3.3: RT*(25)=143.3±1.4[msec] RT*(25)= 83.55i0.76[msec] RT*(25)=134.6dbl.9[msec] 
table 4.2: RT (80) = 258.4±7.7[msec] RT (60)= 1 15.4±3.6[msec] RT(100)=157.7±2.0[msec] 
time factor 1.80 1.38 1.17 

These data show, many parameter methods can be successfiil to reach |F - F*| < 8 at the 
expense of optimisation time: Though, the retardation was here only less than 2. 

They must have a sampling facility for frinction values in order to artificially delay the 
test of the termination criterion. R.S. methods will considerably accelerate the search if 
at any random iteration step a function value is calculated too in the deterministic 
opposite direction in order to make the best choice for progress. These methods should 
keep some record on previous iterations (directions of descent) and adapt their stepsize, 
i.e. the “radius” locally: e.g. by activation of an accelerator on progress and of a 
reduction factor after successive failures to move on. 

5. Conclusions 

Investigations have proven that the computer can be used as a stop watch for 
optimisation times of the order measured in this study. It’s time fluctuations and it’s 
time drifts do not impede an accurate time data collection: The time stability of the 
computer was monitored reliably at any instant. 

The experimental results have shown that optimisation times RT needed to solve (1) 
depend severely on the choice of parameter vector c in the applied search method 
Mj(c), i = 1 ,2,... 1 2. Convergence or non-convergence (i.e. time excess or divergence 
with too large ftmction values) are determined by parameter vector c. For different 
methods the RT(c) ftinctions are different. RT(c) may have a few relative minima, or 
may heavily be structured with low time coordinates and resonances. In general, the 
RT(c) ftmctions for stochastic methods were little structured. 

If time optimal parameters c* were chosen for each search method one definite order of 
search methods with rising run times RT* can be found. This order is different for each 
frinction. The order - related to one particular function - was altered, if one component 
in parameter vector c, e.g. “radius” Cq, was set constant in all search methods. Indeed, 
by changing parameter vector c that order may be reversed manipulating some slow 
method to become “more efficient” than a fast one in solving ( 1 ). The order of methods 
with regard to rising optimal run times differs from the order that follows for the 
number of calculated function values. 

Once the search method to use in some given optimisation problem has been decided 
upon, it’s parameter vector should be changed tentatively to select some practical 
“good” value c for further fast optimisations. The need to do so is based on the 
observed strong c-dependence of optimisation times in any search method. The search 
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must be carried out with different sample sizes to control the stability of the final (x, F) 
position at termination by convergence criterion < c. Eventually the search 

for an unknown optimum must be repeated with decreasing feasibility parameters e in 
order to locate the unknown optimum F* more accurately. 
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Several simple regression estimators can be constructed to approximate the dis- 
tribution function of the m-dimensional normal distribution along a line. These 
functions can be used to find the border points of the feasible region of probability 
constrained stochcistic programming models. Computer experiences show a f2ist 
and robust behaviom* of the root finding techniques. 

Keywords: multinormal distribution, stochastic programming, regression 
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1 Introduction 

Consider the m-dimensional normal distribution with expected value 0 and 
correlation matrix R. Its distribution function and density function are given 
as 



$(h) = 


phi phm 

/ • ■ • / </>{z)dz, 

J—oo J—oo 


(1) 


^{z) = 


(2;r)(-"‘/2)|R|-i exp{-^zR-^z}. 





Computation of the function values $(h) is required in numerical optimiza- 
tion procedures of stochastic programming problems, when the random variables 
of the model have a joint normal distribution. This is the case in solution pro- 
cedures of the STABIL stochastic programming model [11] and the two-stage 
model [9]. Other problems, where computation of (1) is required can be found 
in diverse areas of statistics and engineering ([1], [6], [4]). 
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There are three Monte Carlo methods for efficiently computing values of 
$(h) in higher (m > 5) dimensions. One of them is the method of orthonormal- 
ized estimators [2], the other one is based on Boole-Bonferoni inequalities [13], 
and the latest technique is a hybrid method [8]. These Monte Carlo methods 
give realizations ?7i, . . .,r;n of a random variable where E{rj) = $(h), so in 
numerical computations the unbiased estimator 



1 ” 



*=1 



is used to approximate the value $(h). If 0 ^( 77 ) = cr^, then D^(y) = cr^/n, so 
the standard deviation of the method - which is usually called the error of the 
Monte Carlo computation - is (r/y/n. Invoking the central limit theorem rj (and 
y) is assumed to have a truncated normal distribution rj E iV($(h),cr), where 
0 < <&(h) — 3(7 < 7/ < $(h) + 3(7 < 1. 

A problem, frequently arising in optimization procedures is to find the inter- 
section of a line with the boundary of the set of feasible solutions. Assume, that 
for a given point z and a direction d the halfline z + x{d — z), a? > 0, intersects 
the surface given by $(h) = p, then using the notation f{x) = $(z + ar(d — z)), 
the intersection point can be found by solving the equation 



f{x) = $(z -h x{d -z))=p (2) 

for the unknown x, the halfline is assumed to contain the root. Solving (2) is 
called root finding, which bcisically amounts to finding the p-quantile of the func- 
tion /. For the solution of (2) Szantai [14] presented a heuristic algorithm, that 
moves up and down along a line, increasing the accuracy of function evaluation 
and decreasing the steplength. This seems to be unsatisfactory, since the new 
approximate root is determined depending only on the last two or three func- 
tion values, completely disregarding previous function values - our technique 
takes into consideration all previously computed function values, that hopefully 
increases the efficiency. 

A solution of the problem (2) can be obtained in the following way: (i) using 
regression techniques a function, approximationg f{x) is constructed, (ii) then 
the root of this approximating function is computed, (iii) if this approximate 
root is sufficiently accurate, we can stop, otherwise by using more points the 
approximating regression function is recomputed, and the whole procedure is 
repeated. 

In the next section the basic linear regression techniques are described, - 
some more estimators, full details of the estimators and other possible uses are 
given in [4], [5]. In Section 3. a preliminary algorithm of root finding is outlined, 
together with some details and considerations about the parameters. In Section 
4. the final algorithm is described, while in Section 5. computer experiences 
and numerical results are presented. Finally some remarks and conjectures are 
drawn in the last section. 
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2 Least squares regression estimators 

Consider the one-dimensional function f{x) = $(z-|-x(d - z)), for fixed vectors 
z and d, and assume for now, that d is an increasing direction, that is f{xi) < 
f{x 2 ), if a:i < X 2 . Since $(•) is logconcave (see [10], [11]), then f{x) is also 
logconcave. It is assumed, that we have a Monte Carlo method for computing 
approximate values of the function /, that is for arbitrary points x,- 6 R^,i = 
1, . . . , n reali^zations Pi,i = 1, . , n of a random variable can be computed, 
where G N{f{xi),a), where iV is a truncated normal distribution, restricted 
to (0,1). To find approximations to /(•) regression techniques will be used, that 
is for a given set S = the parameters of an estimator t{x) will be 

computed. Several types of the function t{x) are suggested in this section. 



2.1 Initial estimator - approximation of f{x) by quadratic 
regression 

Let us look for the estimator ti(a;) in the form of a quadratic function 5 i(a;) = 
aix^ -b bix -b Cl. To determine the best fit of type flii(a;) to the function f{x) for 
a given set S = {a:,-,p,}"_i the problem 



n 

niin V [p, -(aix?-b6ia:, -bci)]^. (3) 

ai,oi,ci 

is to be solved for the unknown parameters fli, 6i, ci. The first order optimality 
criteria of this minimization problem can be obtained by differentiating (3) with 
respect to the parameters, and setting the derivatives equal to zero; a system 
of three linear equations emerges, from where the parameters can be expressed 
easily (for the corresponding explicit formulas see [7], [4]). 

Since f{x) is monotone increasing, only the increasing part of the function 
gi{x) can be used as an estimator to f{x), so by defining the solution of the 
equation g[{xt) = 0 to be the truncation point (where xt = -bi/{2ai)) the 
estimator is given as 




9i{xt) - Cl -6f/(4ai), 



if — oo < X < Xt 
if Xt < X < oo. 



(4) 



2.2 Logarithmic estimator - approximation of \ogf{x) by 
quadratic regression 

Instead of approximating f{x) directly, now a quadratic function g 2 {x) = a 2 x'^ + 
b 2 X+C 2 will be used to approximate the function log/(a;), that is the logarithmic 
estimator t 2 {x) of the form exp( 52 (a:)) will be used to approximate f{x). This is 
still a linear estimator - the parameters of t 2 {x) can be expressed from a system 
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of linear equations. To find the best - least square - approximation of the form 
to log /(a?) the following problem is to be solved 



n 

min Y] [g< - (021? + h^Xi + C2)]^w,-, ( 5 ) 

<»2,02>C2 

where qi = logp,* and Wi are some weights, given below. The parameters 
^2,^2} C2 can be expressed from (5) the same way, as it was done for the initial 
estimator. The weights W{ are introduced to counteract the distorting effect of 
the logarithmic transformation: \og{x-\-S)—\ogx depends on x also, not only on 
S. For a given the quantity = (p,- +t*(x,))^/4, i= 1 , . . . , n is defined, 

where is any previously computed estimator. After normalization the final 
weights are obtained as W{ = Truncation point is Xt = —62/(202), 

and so the logarithmic estimator becomes 




92 {x), 

92{xt) = C2-6i/(4a2), 



if —00 < X < Xt 

Xt < X <00. 



( 6 ) 



2.3 Reverse - logarithmic estimator - approximation of 
log(l — f{x)) by quadratic regression 

Now the function log(l — f{^)) is approximated with a quadratic function 
^3(0:) = asx^ + bsx + C3 that is a function ts{x) is constructed in the form 
1 — exp{gs{x)) to approximate the original function f{x). This is a linear esti- 
mator, too, its parameters can be obtained by solving the problem 

n 

min V [n - (asxf + 63X4 + C3)]^u),-, ( 7 ) 

a3,^>3)C3 

where r,* = 1 — logp,* and the weights Wi are computed by normalizing the values 
w* = (1 — Pi + exp(g^(x)y/ 4 j where gsix) is a previously computed version of 
P3. Obviously, instead of exp(p3(x)) the function l—t*(x) can be used with any 
previous estimator i* of f(x). 

Here we have 1 - f(x) as a monotone decreasing function, so the monotone 
decreasing portion of gsix) is to be used in the approximation only. Corre- 
spondingly, taking Xt = —63/(203) for the truncation point the estimator ts is 
defined by the following 



, . _ r 1 - exp(fli3(x()) = 1 - exp(c3 - 6§/(4a3)), if -oo < x < xt 
- I 1 _ exp(^f 3 (a;)), xt < x < oo. 



( 8 ) 
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3 Considerations on root finding strategies 

Now we return to our problem of finding a solution Xr which approximates the 
real root Xr of the problem 



/(®r) = P, (9) 

which corresponds to the problem of finding an approximate solution to the 
equation $(z + a?(d - z)) — for given vectors z, d and reliability level p. An 
approximate root Xr can be obtained by solving the equation t{x) = p, where t 
is any of the previously described estimators. 

Our strategy for finding Xr is realized by an iterative procedure, where the 
estimator is succesively made more and more accurate in the neighbourhood 
of the approximate root. This is achieved by constructing a sequence of shrink- 
ing intervals [a,-,/?,] around the value p, and the corresponding intervals [a,-, 6,] 
around the approximate roots Xr of f{xr) = p (here = oti,f{bi) = /3i, V 

is the actual estimator in the i-th step of the algorithm) . The length of the in- 
tervals is decreased in each step of the procedure. To keep notations simple, the 
algorithm is described for ti{x) = gi{x) = aix^ + bix + ci, (after the necessary 
small modifications the estimators of t 2 and ts can be equally well used), and 
to further simplify notations the indices of ui, 6i, ci are also dropped. 

3.1 Preliminary version of the algorithm 

0. [Initialization.] Assume that we have a set of points So = {xojiPoj}f=ij an 
initial function value interval [ao,/?o] and iteration counter i = 0. Using So the 
function g{x) can be constructed. Set (Tq = 3cr as the half length of the initial 
function value interval, where a is the standard deviation of the values p,-. 

1. [Computation of the function value interval [a,, /?,].] Increase the iteration 

counter i = i I and compute tr,- = where ^ < 1 is a reduction factor. 

Let a = max(p — cr,-, 0), /? = min(p -f o’,-, 1). 

2. [Computation of the interval [a,-, 6,-].] Compute a,- = g~^{a),bi = g~^{^). 

3. [Recomputing the estimator g(x).] Determine K new points Xij,j = 

in the interval [a,- , 6,-] (for example uniformly) and using a Monte Carlo 
technique compute the approximate values p,-j fi^ij)J = Let 

5 = 5o U • • -UiS,', where Si = {xij,Pij}f-i, and recompute g(x) using the set of 
points 5. 

4. [Termination criteria.] If Xr = 9~^{p) is close enough to Xr according to 
some convergence criteria, then stop, otherwise go back to Step 1. 

In the following subsections details of the basic version are considered and 
using results of the computer experiences some suggestions are made concerning 
the details of the final algorithm and the values of the constants. 
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3.2 The initial interval [ao,6o] 

The interval [ao, 6o] should contain the main bulk of f{x)^ that is around in 
order to make the first approximation more or less be near to f{x). Given /(•) 
and a suitable Monte Carlo method, by halving and doubling the steplength an 
initial interval [aoj^^o] can be selected, which contains Xr with large probability 
(for example assume, that /(ao) fi^o) > p + 3<t holds, near equality 

is preferred). The initial points xoj^j = can be selected at equal 

distance (previously computed pairs {x,*,p,} can be included), and they might 
be dispersed on a wider interval as well, that is /(ao) ^ ^,/(fro) 1 — (5, <5 = 

0.01 — 0.1 is acceptable as well. 

We have to avoid cases when several points xqj produce very small (near to 
0.01) or very great (near to 0.99) function values, because this would result in a 
very poor initial approximation p(x), causing very slow convergence of the root 
finding algorithm. Generally we have some previous knowledge of the function, 
since some fecisible solutions of the optimization problem - for which / takes on 
values greater than p - are available and this information helps in constructing 
the initial interval. In short, the initial interval should be wide enough to contain 
the meaningful part of /, and small enough to facilitate convergence. 

An easy and practical way to ensure, that the initial interval is well located 
(more or less symmetric around the root, see Subsection 3.4) is to count the 
number of function values greater than p, that should be the same (or almost 
the same) as the number of function values smaller than p. 

3.3 The reduction factor q 

This constant governs the speed of decrease of the intervals (first the decrease of 
the function value interval [a,*, /?,], than by using the transformation that of 
[aijbi]). There are two, somewhat contradictory aims to realize in determining 
g and the intervals. First we would like to decrease [a,-,/?,] cis fast as possible, 
so it would be very small around p, so that [a,, 6,] would be small around x^, 
and thus new points x,j would be near to the real root, because this is the 
point where we want to make our approximation g{x) to be very close to f{x). 
Secondly, a very fast decrease {g ^ 0.1) would very soon result in a pointlike 
interval [a,-, 6,] which with large probability would not contain Xr, so consecutive 
intervals [a,-, 6,]) would jump up and down along our line, trying to locate the 
root, thus we are loosing stability and speed. 

The best choice is to have a sequence of intervals [a,-, 6,] D [a,+i, D 
... D Xr. According to this consideration the choice g ^ 0.9 is preferred. 
Furthermore, the value of g is related to the number K of newly generated 
points. Since K new point ideally would reduce the error of the estimator 
around Xr by about a factor 1/V^, so p>/A should be a constant, greater than 
1 (to preserve stability). Computer experiences suggest to take g^/K ^ 2. 
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3.4 Symmetrization of [a^, bi] 

By generating points not symmetrically, the appproximation would be loopsided 
~ accurate in regions not interesting for us, so the best strategy seems to de- 
termine [a,', 6 i] symmetrically around Xr. Since the real root is not known at 
the time of the computation, we determine [a,-, 6,] symmetrically around the 
actual approximation, also taking into account the derivative of the function 
g. Compare the differences in the function values by evaluating the fraction 
o = {p — g{xr — S))/{g{xr + S) — p), where 5 is the probable halflength of the 
next interval Xr — fli+i 62+1 — Xr (we can set 5 = 0.4(xr — a*)), and determine 
the new function value interval as a,+i = ocr,-, = cr,-. 

So the determination of the intervals is done by first making a symmetrical 
interval around the root, then determining the function value interval and finally 
computing the interval, where new points are generated. 

3.5 First and second phase 

To safeguard against the unpleasant consequences of having a very small interval 
[a,-, bi] it might be very useful to implement a second phase in the algorithm: if 
the function value interval becomes very small, we do not decrease the length 
of the function value interval any more (or we change it much slower). Assume, 
that we want to determine a root with accuracy e, that is \f{xr) — p| < e. 
Then in case of /3i — a, < e, we change the value of the reduction factor by 
setting Q=z I [oT Q=z 0.99). From this point on a,- and /3i remain the same, but 
a* == g~^{oii)jbi = g~^{0i) are still updated (since the function g is changed in 
each iteration). 

3.6 Stability of the estimator 

Unfortunately the concavity of the approximating function can not be always 
guaranteed, due to possible large errors in the values of pi. So g can become a 
convex function, a full computer implementation of the root-finding algorithm 
has to be prepared to handle this case too. Generally it happens in the first or 
second step of the iterative procedure, when the number of points Xij is rather 
small. 

This ’’flipping over” is also an indication, that the function /, compared 
to the error a is very flat, so increased accuracy - greater K, or greater g 
- is required. This involves checking the negativity of the constant a, and if 
a > 0 then the greater root of the quadratic function g has to be accepted (the 
truncation in this case would give the right hand side of a convex function). 
This phenomena may occur also in the case of a ’’bad” initial interval. 
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3.7 Truncation in case of high reliability 

In cases, when the reliability level p is very high p ^ 0.95 — 0.999 (or very 
small, p 0.05 — 0.001) the truncation of the quadratic function should be 
changed. The original truncation - replacing the decreasing part by a constant 
- can severely restrict the upward movement of the approximate root. A new 
truncation rule could be the following. Select a point Xtt a little smaller, than the 
original truncation point, that is let Xtt = ~ e, where e ^ ( 6 ,- — a ,)/10 • • • ( 6 ,- — 

a ,')/100 and use a linear increasing function in the truncated part, which is the 
tangent line of the function g{x) at the point xtt, until it reaches the values 1 
(this kind of truncation is called linear truncation, as opposed to the previous 
constant truncation). That is let the line be given by atx-hbt. then we have the 
equations atXtt + + it = o>t) from which the constants at and 

bt can be expressed. The final form of t{x) (for the initial estimator) becomes 



r gi{x), if-oo <x <xtt 

\ min{l, atx + bt), if Xu < x < {I - bt)/at. 



( 10 ) 



3.8 Updating the estimator 

The computation of f{x) is done by a Monte Carlo method, so its error (stan- 
dard deviation) cr determines the necessary amount of work in evaluating f{x). 
The root finding algorithm’s computationally most demanding part is the deter- 
mination of the estimator g{x), when using old and new points in S the function 
g{x) is recomputed. We show, that - neglecting the first few steps of the algo- 
rithm - a simple updating procedure can be used, saving thus a considerably 
amount of time. This modification is described for the logarithmic estimator 
t 2 {x)] for the reverse-logarithmic estimator and the initial estimator the same 
steps are to be made. 

In evaluating the parameters of t 2 {x) = exp(a 2 a?^ + b 2 X + C 2 ) we have ex- 
pressions like qO = ^ • • • ) ^3 = ~ details see [7], [4]). 

Assume, that in the last iteration of the root finding procedure we had I points 
in 5, and K points {xij,pij},j = are additionally determined. Then 

the new expressions qO , ...x3 apppearing in the formulas for 02 , 625^2 can be 
obtained as 



qO = 



I — 
q0 + 



K 



I + K 



I-\-K 



= j 



I 

x4 + 



i=i 



+ K 



I + K 



K 

E 

i=i 






This updating scheme is possible, because the weights are basically the same, if 
the interval, where the values Xij are coming from is small enough (in the fol- 
lowing root finding algorithm this will be true from the second or third iteration 
on already). 
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4 Root finding algorithm - p-quantile determi- 
nation 

The algorithm is described for the estimator fi(x); to apply the same algorithm 
to the logarithmic and reverse-logarithmic estimators just some minor changes 
are to be made {pi is replaced by g,* or r,-, the formula for the ratio o is to be 
changed, conditions on the roots being smaller than the truncation point should 
be reversed for reverse logarithmic estimators, etc.) 

0. [Initialization.] Assume, that we have an initial interval [ao,6o] and K 
points xoj in it (given maybe at equal distances) and the ” noisy” fuction values 
Poj ^ /(^oj), furthermore the estimator ^f(x) using the points Sq = 

is already computed. Set the initial values ctq = a/g^ i=0, steplength stl = 
{bo - ao)/K, 

1. [Computation of the length of the function value interval.] Increase the 

iteration counter i = i + 1, compute cr,- = and the truncation point 

xt = ~6/(2a). 

2. [Computation of the preliminary interval [a,-, 6,].] Determine the ap- 

proximate roots Xr from the equation g{xr) = p. If a < 0, then accept the 
root, which is smaller than Xt (or set Xr = xt if there is no solution to the 
equation). If a > 0, then accept the root, which is greater then xt (or set 
Xr = Xt if there is no solution to the equation). Compute S = 2 * stl and 
let x~ zn Xr — <5, = Xr S. The ratio o = {p — g{x~~) / {g{x^) — p) indi- 

cates the relation of the left and righthandside derivatives approximately. Let 
ai = max{p - oo*,-, 0.0001), /?,• = min(p -f (t,-, 0.9999). 

3. [Computation of the interval [a*,6»].] Compute the values of a,- = 

< 0, select those roots, that are smaller than 
Xt, if a > 0, then select the greater roots (if no solution exist, then take 
ai = Xt, hi = ai -h 0.1(5). 

4. [Updating g{x).] Select K new points in [a,-, 6,] by letting Xij (6,* — 

ai){j — 1)/(AT — 1) -h = 1, . . ., AT and compute by a Monte Carlo method 
the noisy function values pij ^ /(a^»j), = f{xij). Let the set of new points 

be Si = {xij,Pij}f-i, and S = Uj_o Si. Compute the new approximation ^(x) 
using all points in S, and determine the new approximate root x* = g^^{p) (if 
a < 0, then choose the smaller root, otherwise the greater one and set Xr = xt, 
if no solution exists) . 

5. [Test of convergence.] If Nq < I = {i + l)K then stop, where Nq is 
a prescribed number of function evaluations. Otherwise set Xr = x*,stl = 
2{bi — ai)/K and go back to Step 1. 

The second phase is not built in into this version, so the modification de- 
scribed in subsection 3.5 are to be incorporated in case of need. Also, the new 
truncation rule suggested for large p was omitted. For the sake of clarity the 
updating scheme proposed in Subsection 3.8 was also left out, but is should be in- 
corporated whenever speed is important. Instead of the content of Step 5. other 
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stopping rules can be used as well; the simplest one is described above to stop 
the algorithm after a prescibed number of iterations. The necessary nu mber No 
of iterations can be determined from the equation € = a/ \/{N q + l)^f, where cr 
is the standard deviation in the Monte Carlo computation of one function eval- 
uation, e is the required precision of the result measured in function value differ- 
ence \f{xr) — /(^r)| (this formula is based on computer experiences, see Section 
6.). Another possibility for the stopping rule is to take \g{xr) - g{x*)\ < ei (or 
\xr — x*\ < € 2 , with some prescribed precisions ei, 62) and stop, if the condition 
is satisfied. 

5 Computational results 

In the computer tests a crude Monte Carlo technique was used for computing 
the values $(•) (that is /(•)) of the distribution function of the m-dimensional 
normal distribution: 100 normally distributed vectors were generated and then 
tested if they lie in the domain of integration (this is the sim plest impo rtance 
sampling, see [2], [3]). The results had a standard deviation y/p*{l — p*)/10, if 
the function value to be evaluated was p*, so for p* =0.5 this gave an error of 
0.05. The number of points in each iteration was set to K = 10, the reduction 
factor to p = 0.6 (see 3.3), the number of iterations was kept fixed at 9, so 
altogether 100 function value evaluations of /(•) were performed in each run, 
that were used to compute the final regression estimator. 

The errors were measured in function value differences, since values of param- 
eter X can be changed by scaling. The entry ’’error” in the tables was calculated 
as \f{xr) — /(^r)|- The ’’true root” was computed by the same algorithm, with 
increased accuracy and more sample points, so it is an approximation only. A 
great number of computer runs were performed during the testing, here only a 
small portion of the results are presented, these result are not the best, neither 
the worst, they show just average behaviour. The description of the four main 
examples are given in the Appendix. 

The sequence of approximate roots produced by the root finding procedure 
depends on the specific estimator used in the root finding algorithm (that could 
be any of the initial, logarithmic and reverse-logarithmic estimators), this will be 
called the main estimator and its name is given in boldface. Using the points Xi 
and function values pi determined by this main estimator, the other estimators 
can be fitted to 5 = and approximate roots computed for the other 

two estimators as well. To give an introductory picture, a set of results is shown 
in the first table. 

There seems to be no discernable difference between the performance of 
the main estimator, and the results obtained from the other estimators. Also, 
there seems to be no difference in using different estimators for the main es- 
timator. To illustrate this statement the following set of results, concerning 
Example 4 is offered in the second table. Three different runs were performed. 
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the main estimator’s name is given in boldface; here we have m=50 dimen- 
sions, uncorrelated components, reliability level p = 0.95, a = 0.05, the true 
root is Xr = 6.368, the initial interval was [ao,6o] = [3.0,10.2], for which we 
have /(3.0) = 0.79, /(10.2) = 0.97. 



No. 

exmp 


Dimens. 

m 


Rel. 

P 






Root 

Xr 


Error 


1. 


2 


0.8 


1.1114 


Init 














Log 


Rml 


EBii 










R-log 




0.0048 


1. 


2 






Init 


1.8193 












Log 


1.8679 












R-log 




lift IBM 




2 


0.8 


2.5131 


Init 


2.5012 


■jAilhl 










Log 


2.4887 












R-log 


2.5741 






10 


0.8 


2.510 




2.479 














2.468 


mm 












2.478 


0.0043 




10 


0.9 


3.910 


Init 


3.936 


mm 










Log 


3.852 


0.0024 










R-log 


3.910 


mm 








5.895 


Init 


5.7241 


0.0024 










Log 


5.8273 


0.0010 










R-log 


5.9986 


0.0012 




50 


0.9 


4.276 


Init 














Log 


4.164 


lilinil 










R-log 


4.2158 


0.0029 



No. 

run 


Final interv. 

[«9, h] 


/(ag) 


/(^9) 


Estim. 


Root 

Xr 


Error 


1. 


[6.56,6.69] 


pH- 0.0024 


p -1-0.0039 




6.560 


0.0024 












6.551 


0.0022 










■SpW 


6.845 


0.0057 




[6.62,6.71] 


p-l-0.0031 


p-l-0.0041 


Init 


6.687 


0.0030 










Log 


6.676 


0.0037 










R-log 


6.963 


0.0066 




[6.79,6,92] 


p-hO.0050 




Init 


6.736 


0.0044 










Log 


6.727 


0.0043 










R-log 


6.849 


0.0056 



Next to show, how results differ from each other in case of different initial 
intervals we give the results for Example 4. on the next page (with p = 0.9, (T = 
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0.05, the true root being Xr = 4.2716) of four different runs, the main estimator 
was always the initial estimator ti{x). 



No. 

run 


Init. interv. 

[ao,M 


f(ao),fih) 


Final interv. 

[09, 69] 


Estim. 


Root 

Xr 


Error 


1. 


[2.5, 7.0] 


0.62, 0.95 


[4.23,4.25] 


Init 


4.275 


0.0001 










Log 


4.272 


0.0001 










R-log 


4.351 


0.0032 




[1.0, 5.5] 


0.002, 0.93 


[4.03,4.05] 






ISKt SSI 












^^9 














4.139 






[1.0,8.2] 


0.002, 0.96 




Init 


4.295 












Log 


4.308 


0.0015 










R-log 


4.437 


0.0067 




[1.0,8.2] 


0.002,0.96 


[4.32,4.35] 


Init 


4.335 












Log 


4.336 












R-log 


4.443 





Reliability 

P 


True root 

Xr 


Estimator 


Root 

Xr 


Error 

|/(a;r) -/(xr)| 


0.5 


0.8915 


Init 


0.8806 


0.0036 






Log 


0.8786 


0.0043 






R-log 


0.8854 


0.0021 


0.5 


0.8915 


Init 


0.8780 








Log 


0.8734 








R-log 


0.8843 


0.0024 


0.5 


0.8915 


Init 


0.8888 








Log 


0.8860 


0.0019 






R-log 


0.8959 


0.0014 


0.8 


2.513 


Init 


2.495 








Log 


2.492 








R-log 


2.545 


0.0034 


0.8 


2.513 


Init 


2.5265 


■lilMEH 






Log 


2.5214 








R-log 


2.5633 




0.8 


2.513 


Init 


2.5063 


0.0008 






Log 


2.5004 


0.0013 






R-log 


2.5468 


0.0035 


0.95 


5.8907 


Init 


5.655 








Log 










R-log 







In this third table of results (at the top) it would be worthwhile to have a closer 
loook at the first and the second run, where all the estimators gave final roots 
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outside the final interval, showing that the algorithm is self-correcting. Note 
the outstanding performance of the first run, which is due to the choice of the 
good initial interval [ao,6o] (it is almost symmetrical around the root but 
this does not show in the function values (/(uo) = 0.62 and /(6o) = 0.95 is not 
symmetric around f{xr) = 0.9). The second run indicates, what can a badly 
placed inital interval do to the final result. 

Numerical results strongly support the need for the initial interval to contain 
the main bulk of the density function around the real root. 

The last set of results shown (previous page bottom) is given for Example 
3. to illustrate, that the performance of the method does not depend on the 
probability p to be computed (m = 10, cr 0.05, initial interval varying, for the 
first run [ao,6o] = [0.0, 1.8], /(uq) = 0.05, /(6q) = 0.75). 



6 Remarks 

Presently the behaviour of the algorithm is not fully explained yet, the remarks 
below are conjectures, based on extensive numerical computations. The most 
important conjecture is that with more or less properly set parameters the 
error of approximation around Xr (that is the difference of the function values 
\f{xr) — /(^r)|) is decreasing with a/y/N^ where N is the number of all points 
in the set S and cr is the error of one function evaluation (so in the numerical 
examples we registered errors less than 0.005 = 0.05/10 almost everywhere. 
Probably this phenomena is due to that property of the algorithm, that it very 
fast focuses on a small interval, where the function we are estimating is rather 
fiat, compared to the error a. 

In other words if we want to receive a root with an error e, then instead 
of trying to compute with error e the function f{x) for some fixed x values, 
trying to figure out the location of the root Xr , we should use the root finding 
algorithm, where function values are computed with error lOe, and after 10 
iterations, in each of them 10 points; then in light of the computer experiences 
this root finding algorithm produces an approximate root with deviation e in 
the neighbourhood of the root, and it is accomplished at the cost of computing 
only one function value with an error e. 

Furthermore, this root finding procedure gives an approximation ^(x), that 
has about the same error on and around the final interval, not only in a point, so 
it becomes very easy to safeguard against leaving the region of feasible solutions 
(e.g. this property can be utilized in barrier methods). We can compute for 
example a ”safe” root x,, for which f{xs) = P+S^ where (5 is a prescribed safety 
margin. If the error e of the root finding procedure is less than J, then Xg is in 
the feasible region with high probability. 

The root finding algorithm is self-correcting in the following sense. Assume, 
that in the i-th. iteration an interval [a,-, 6,] was determined, which does not 
contain the true root x^. If in the next iteration (s) more points are added to 
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S in [aj,6j], then after a while the approximation becomes sufficiently close to 
the function f{x)^ and will generate another interval, nearer to the root, or 
containing the root. 

The algorithm can be applied for non-increasing directions as well, just the 
selection of the proper root has to be apppropriately modified, dependent on 
the problem we have at hand. 

7 Appendix 

The details of the numerical examples used in the computer experimentations 
are given here. Two of the four listed examples describe 2-dimensional nor- 
mal distributions, one is a 10-dimensional example and the last one is a 50- 
dimensional distribution. The examples are described as follows: for a given 
distribution function $(•) of the m-dimensional normal distribution the func- 
tion f{x) is used in the numerical computations, where 

f{x) = $(z-haj(d-z)). 

Example 1. Dimension m = 2, z = (0, 1.5), d = (1, 0), the correlation matrix 
R is given by ri 2 = T 2 i = —0.9, rn = V 22 = 1-0. 

Example 2. Dimension m = 2, z = (0.8, 1.5), d = (2.4, 1.6), the correlation 
matrix R is given by ri 2 = V 2 i = 0.95, rn = r 22 = 1.0. 

Example 3. Dimension m = 10, z = (1.1, 0.3, 0, 1.1, —0.3, 0.8, 1.2, 3, 1, 1.6), 
d = (1.2, 2, 0.5, 1.9, 3,2.1, 2, 3.1, 1.9, 1.8), the correlation matrix R is given by 
ri2 = ^21 = 0.9, rs4 = V43 = -0.95, r^e = ^65 == 0.5, rys = rsr = -0.9, 
rg 10 = ^ 10,9 = 0.8, ra = 1.0 for z = 1, . . . , 10, all other elements of R are zero. 

Example 4. Dimension m = 50, z = (1.1, 0.3, 0, 1.1, —0.3, 0.8, 1.2, 3, 1, 1.6, 
-3, 1,1.2, 2.1, -1.1, -0.8, -0.2, 0.3, 1.3, 1.8, 1.2, 0.4, 0.6, 2.1, -0.6, -0.3, -2.1, 

2.2, 3.1, 0.4, 0.3, 0.7, 0.8, 1.3, -1.3, -1.1, -0.4- 0.1, 0.5, 0.8, 0.2, 2, 2.1, -0.6, 0.5, 
0.9, 1.1, 1.3, 6, l),d= (1.2, 2, 0.5, 1.9, 3, 2.1, 2, 3.1, 1.9, 1.8, -1,1.3, 3, 2.9, 0,0.5, 2, 
1.2, 2.1, 2.8, 4.6, 2.4, 2.1, 4, 1.21.4, -0.4, 4, 6, 3, 2, 2.2, 2.8, 2.2, 2.1, 1,1.4, 0.9, 2.2, 
2.3, 1.9, 2.5, 4.1, 1.2, 2.5, 2.4, 1.9, 1.8, 9,3), the correlation matrix is the identity 
matrix, that is ru = 1.0, i = 1, . . . , 50, and Vij = 0, if i ^ j, 

Remark.This work was supported by grants from the National Scientific 
Research Fund, Hungary, T13980 and T16413, and the ’’Human Capital and 
Mobility of the European Union ERB CJHR XTC 930087. 
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Summary: Many problems in mechanics may be described by ordinary dif- 
ferential equations (ODE) and can be solved numerically by a variety of reli- 
able numerical algorithms. For optimization or sensitivity analysis often the 
derivatives of final values of an initial value problem with respect to certain 
system parameters have to be computed. This paper discusses some subtle 
issues in the application of Algorithmic (or Automatic) Differentiation (AD) 
techniques to the differentiation of numerical integration algorithms. Since 
AD tools are not aware of the overall algorithm underlying a particular pro- 
gram, and apply the chain rule of differential calculus at the elementary oper- 
ation level, we investigate how the derivatives computed by AD tools relate to 
the mathematically desired derivatives in the presence of numerical artifacts 
such as stepsize control in the integrator. As it turns out, the computation 
of the final time step is of critical importance. This work illustrates that AD 
tools compute the derivatives of the program employed to arrive at a solution, 
not just the derivatives of the solution that one would have arrived at with 
strictly mathematical means, and that, while the two may be different, high- 
level algorithmic insight allows for the reconciliation of these discrepancies. 



Keywords: Algorithmic Differentiation, Sensitivity Analysis, Multibody Sys- 
tems, Automatic Differentiation, Differential Equations, Numerical Integra- 
tion 



1 Ordinary Differential Equations and 
Multibody Systems 

Ordinary differential equations appear in many fields of science and techno- 
logy. Although the modeling and description of systems may be very different. 
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the frequently resulting differential equations can be divided into a few major 
categories, e.g. ordinary differential equations or partial differential equations. 
Only for some relatively simple types of differential equations can one provide 
a closed-form analytical solution. Even if this is possible, the mathematical 
derivation can be difficult and cumbersome. Therefore, differential equations 
are usually solved by numerical integration algorithms. Many research groups 
have developed reliable and efficient algorithms and a large body of literature 
is devoted to this subject, see e.g. [14], [5], [12]. 

In this paper we restrict ourselves to the solution of nonlinear ordinary 
differential equations. The resulting initial value problem may then be for- 
mulated as follows: 

For a given value of system parameters p G find the traject- 
ories x{t,p),x G iR” for <t <t^ where x is the state vector, t 
the time, the initial time and the final time, respectively. The 
states are determined by the solution of the initial value problem 

x = f{x,p,t), x{t = t^,p) = x°, (1) 

where / is the vector of (usually nonlinear) state derivatives and 
x^ is the initial state. 

Our interest in the differentiation of integration algorithms arose during in- 
vestigations related to the optimization of multibody systems. The multibody 
system approach can be applied successfully to the description and analysis 
of mechanical systems where the individual parts undergo large translational 
and rotational displacements, whereas the deformation of the parts themselves 
is neglected [13]. The basic elements of a multibody system model are rigid 
bodies, coupling elements including springs, dampers or active force elements, 
and joints such as bearings or ideally position controlled elements, see Fig.l. 

The generalized coordinates describing e.g. translations or rotations of bod- 
ies are summarized in a vector y e . Analogously, the translational and 
angular velocities of the individual bodies are described by generalized velo- 
cities z e where we have / = p for holonomic multibody systems. The 
position and velocity for the whole time history are then determined by the 
differential equations of motion and the implicit initial conditions 

y = v{t,y,z,p), ^{f,y°,p) = 0, 

M{t,y,p)z + k{t,y,z,p) = q{t,y,z,p), ,y° ,z° ,p) = 0 

which can be found from Newton’s law, Euler’s law, and d’Alembert’s prin- 
ciple, see [13]. Here M is the mass matrix of the system, q the vector of 
applied forces, k the vector of Coriolis forces, and v the kinematics vector. 
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Fig. 1: Basic elements of a multibody system 



The set of ordinary differential equations can be solved numerically by an ap- 
propriate numerical integration scheme, for example, by Runge-Kutta methods 
[12] or the Shampine-Gordon algorithm [14]. The final time is determined 
by an implicit final condition 

H^{t\y\z\p) = 0. (3) 

In the software system NEWOPT/AIMS [7], the fully symbolical generation of 
the differential equations even for very large multibody systems is supported 
by the NEWEUL approach [11]. 

The multibody systems modelling approach shown above is typical of many 
other approaches in mechanics. However, in the sequel, we deal with the 
notationally simpler and more general notation (1) instead of (2). No special 
use is made from the structure of (2) throughout this paper. 

2 Algorithmic Differentiation 

To illustrate the basic ideas of Algorithmic (or Automatic) Differentiation, we 
discuss the simple formula f = x\ , where / is the dependent variable and 
xi^X 2 are the independent variables with respect to differentiation. The goal 
is to compute the total derivative df/dx^ with x = [xi,X 2 ]'^. Differentiating 
by hand or using a formula manipulation program like Maple [6], one can 
easily find the solution df /dx^ = [1, 

On the other hand, to evaluate the function, i.e. to compute numerical 
values for the results, a compiler would transform the whole function into a 
sequence of simple elementary operations as shown in the middle of Fig.2. 
This view of a computer program is usually called a ‘computational graph’. 
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and one can interpret automatic differentiation as an augmentation procedure 
of such a computational graph. This viewpoint, and generalizations thereof, 
are discussed in more detail in [4]. 

The total derivative may be evaluated using the chain rule, i.e. 



dXj 



[ TT (partial derivative along arc e) j (4) 

PeMp{xi,Xj) \eeP / 



with 



Mp ... set of all paths between Xj and X{, 

P ... a single path from Mp, 
e ... a single arc along P. 

Note that the required partial derivatives along the arcs can easily be com- 
puted, because at each node only basic operations are performed. The chain 
rule in (4) can be evaluated using different computation sequences, in the 
literature often called ‘modes’, see [10]. The different evaluation sequences 
of course have to yield the same results, but this freedom can be utilized to 
lower computational or storage complexity. The exploration of these tasks is 
a field of very active research, see [1]. 

The two traditional modes of automatic differentiation, the forward mode 
and the reverse mode, are both displayed in Fig.2 for our simple example. 

• Using the forward mode, a gradient vector Vxi is assigned to each 
intermediate node representing the total derivatives of this node 
with respect to all independent variables. Each arc is assigned the 
partial derivative of the ‘output node’ with respect to the ‘input node’. 
The gradient associated with a particular node is computed using the 
chain rule, at the same time as the value of the node is computed, 
resulting in a computational cost that increases linear with the number 
of independent variables. 

• Using the reverse mode, only a scalar gradient Xi is assigned to each 
intermediate node. This scalar contains the total derivative of the de- 
pendent variable with respect to the intermediate node X{. To calculate 
the required final gradient the computation sequence must be reversed, 
i.e. first the values of all intermediate nodes have to be computed and 
then the gradients are computed from the dependent to the independent 
variables and not from the independent to the dependent variable as in 
the forward mode. This requires storing or recomputing the values of 
all intermediate nodes, but the computation time is not depending on 
the number of independent variables any longer. 
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gradient forward mode: 




Vx 



5 







gradient reverse mode: 




Fig. 2: Forward and reverse mode for the function: f = xi — 



Automatic differentiation tools perform this derivative augmentation in a 
completely mechanical fashion, applying and generalizing the principles out- 
lined above to programs of arbitrary length and complexity. It should be 
noted, however, that the implementation approaches^ taken may differ con- 
siderably from the graph-oriented view outlined above. In our experiments, 
we employed the ADIFOR [3] tool. 

Note that AD tools do not have an understanding of the global behavior 
of the code they differentiate. Derivatives are computed for all variables 
whose value depends on the independent variables and impact the dependent 
variables (such variables are called active). So, in particular, AD tools do 
not know the mathematical considerations that a programmer went through 
in constructing a particular piece of code. 

3 Differentiation of Numerical Integration 
Algorithms 

In optimization of mechanical systems often total derivatives of some final val- 
ues of an initial value problem with respect to some system parameters have 

^see http://www.mcs.anl.gov/Projects/autodiff/AD_Tools for an overview of current 
AD tools 
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to be computed. Thus, it is interesting to differentiate a numerical integra- 
tion algorithm with AD techniques and investigate whether and under which 
circumstances this ‘differentiated integration algorithm’ really computes the 
correct total derivatives. 

Other, highly specialized algorithms, e.g. the adjoint variable method [2], 
[8] to compute sensitivities of integral type performance criteria in multibody 
dynamics exist for similar purposes, but the development of such codes re- 
quires usually a lot of insight and always a lot of time. Therefore, the first goal 
applying AD tools to numerical integration algorithms is to get the correct 
results and save ‘man time’, but not necessarily to save a lot of CPU-time. 

Prom the ODE (1) one usually cannot derive solutions x{t) analytically. 
Therefore, numerical integration algorithms have to discretize and solve the 
problem in an appropriate way. If we restrict ourselves to single-step al- 
gorithms, this can be written as 

Xi+i (p) = Xi (p) + hi (p)^(p) , (5) 

where the index i denotes a value at time U, hi is the actual stepsize and ^ is 
a slope estimation. If = ft = const, we have an algorithm without stepsize 
control. Usually the slope estimation is an average of evaluations for different 
times and approximations. For constant stepsize and ^ = Xi = /^ we have 
the simplest integration scheme, the explicit Euler algorithm. Note that the 
trajectory Xi as well as the slopes Wi and the stepsize hi are all dependent 
on the system parameters p. Fig.3 shows a simplified description of the time- 
stepping loop of an typical explicit integration algorithm with stepsize control, 
where g is some function that adjusts the time step. Methods I and II are 
two integration methods of different order. For simplicity, we ignored the fact 
that the time step will be adjusted upwards if there is a good fit. 

If, for a a given p, we are interested in dx/dp^\t=t^, we can employ an 
AD tool to differentiate this code with respect to p. If we differentiate with 
respect to p, and use V® to denote d • /dp^, the chain rule of differential 
calculus now implies that 

V(.,„ = + If vs. (6) 

Clearly, VJ ^ 0 in general, as S depends on x, which in turn depends on 
p. Thus we have the interesting situation that, when dgjdS ^ 0, the com- 
putational equivalent of time will have a nonzero derivative with respect to 
the parameter p. Viewed from an analytical perspective, this is nonsense — 
the values of time and the parameter are not related. From a computational 
perspective however, it does make sense — depending on the value of the 
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Given: parameter p, current time U, current solution Xd « Xi{ti,p)^ 
suggested time step hi 

1) Compute X\ « x{U -f hi^p) using method I 

2) Compute X 2 « x{ti -h hi,p) using method II 

3) Compute S = \\xi - X2W for some norm || • || 

4) If 5 < some given threshold 

accept the higher-order of xi and X 2 as 
and update <r- ti~\-hi 
else 

hi = 9{hi, S) 
goto 1) 
endif 



Fig. 3: Simplified description of a numerical integration algorithm 



parameter, we may choose a different time discretization. Thus, what we 
really compute as the final value x^{p) is 



X^ (p) = X 




( 7 ) 


(note the dependence of t on p). Thus, we obtain 




_ dx 

= dl 




(8) 


and with Eq.(l) 






d X 

Vxt=ti = 


( 9 ) 



Note that Vx and Vt will have been computed by the AD-generated derivative 
code. We observe the following: 



(i) . Depending on how the time discretization was chosen, we will obtain 

different values for and thus for Most certainly, we will 

not obtain dx/dp^\t=t^ which is the result desired by most users. 

(ii) . If At would have been zero at every step, we would have Vtt=t^ = 0 

and thus Vxt=t^ = dx/dp'^\t=ti, as desired by the user. By default, 
this happens in methods using a fixed step size. 

(iii) . Independent of how the time discretization was chosen, we can recover 

the desired solution as 

dx/dp^\t=t'^ = Vxt=ti - 



( 10 ) 
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These issues are discussed in more detail in a forthcoming paper [9]. 

Note that approaches (ii) and (iii) are really geared toward the sophistic- 
ated AD user. When an integrator code is written, it is probably feasible to 
indicate the places where the next time step is assigned and to indicate that 
an AD tool should treat this statement as constant with respect to differenti- 
ation, resulting in the assignment of a zero gradient. At any rate, unless the 
developer of the integrator provides this information, the considerable sophist- 
ication of these codes makes it difficult for others to extract this information 
from the code. 



While one might take the attitude that this was not really an issue given 
the ‘fix’ (iii), this is not really the case. Even when dxjdp^ is well behaved, 
Vt and Vx can become very large and can overflow. Furthermore, the user 
of an AD tool may well be unaware of these issues, or may not be able to 
localize the problem since the integrator may be buried under other layers of 
software. However, as shown in the next section, if the final time is handled 
appropriately, we are likely to obtain = 0 and everything works out; 

we suspect that this situation has happened in quite a few AD applications. 

We note that while (ii) and (iii) will result in the right derivatives dxjdp^ ^ 
there is no guarantee that the derivatives will be obtained at the same accuracy 
as the solution x, since the guard of the if-statement governing acceptance or 
rejection of a step will not be augmented by AD, and thus still will be only 
governed by the behavior of x. Thus, the derivatives obtained by Eq.(lO) 
will be consistent, but they may not be as accurate as those obtained by 
solving the sensitivity equations, which are obtained by direct differentiation 
of Eq.(l): 



^ - \ ^ dx\ _ df dx df 

dp'^ * dp'^ \dt ) dx'^ dp'^ dp^ ’ 



( 11 ) 



d / dx 
dt \dp'^ 



|(Vx) = g,Vx + 



dp'^’ 



( 12 ) 



It is easy to add the norm of to the guard for stepsize control, but an AD 
tool cannot be expected to do so without user guidance. 

The sensitivity equation (12) can also be used to obtain the required gradi- 
ents using the undifferentiated integration algorithm, but usually it requires a 
lot of work to derive the sensitivity equations by hand or with formula manip- 
ulation programs. Often the integration algorithm is only one part of a bigger 
program. Then it is not trivial to extract the ODE without major changes in 
the original program flow to derive the sensitivity equations and to reintegrate 
them back into the program. In [2] an approach similar to Eq.(12) is used 
to combine hand-derived sensitivity equations with automatically generated 
partial derivatives. AD tools on the other side allow to compute the gradients 
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without much additional efforts as soon as the initial value problem itself can 
be formulated and solved correctly. 

4 Handling of Final Time 

So far we intensionally ignored the treatment of the final integration step, but 
we will see that this is an essential task. Several integration algorithms have 
been investigated and by manual changes like (10) we are able to compute 
correct gradients. 

It is somewhat surprising that with suitable handling of the final time these 
corrections are not required and nevertheless one arrives at the right results. 
In Fig.4a the final time handling is shown for an algorithm which limits the 
last step size such that the last step satisfies ti = t^. In comparison with 
the previous described algorithm we have to add only the computation of the 
proposed next time step hi^i . The most important information is contained in 
the limitation of the last step size hi = t^ — U which leads in the differentiated 
code to Vhi = —VU if the final time does not depend on the parameters. 
Prom the time step update hi we get Vti+i = VU -h Vhi and 

because for the last step (and only for the last step) we have Vhi = — Vti, it 
follows that Vti-j-i = — Vti -h Vti = 0. Therefore, if the algorithm guarantees 
that the last step is performed at the final time the correction from (10) is not 
required and it is not even necessary to modify the automatically generated 
code for the gradients. Of course, the correct computation of VU for the 
whole time interval is still essential. 

Another frequently used approach in numerical integration codes to handle 
the final time is shown in Fig.4b, where the integration algorithm continues 
the time integration until the the final time has been passed. Then the val- 
ues at the desired final time are computed by interpolation. This has the 
advantage that no costly additional evaluation of the system is required, only 
a computationally cheaper interpolation. The problem, however, is that the 
gradients Vx and Vt are interpolated as well and usually we have ^ 0. 
A correction using (10) still leads to the correct results, but not without user 
manipulation of the generated code. 

Note that the time discretization and step size control are not necessarily 
the only numerical artifacts. Features such as variable order polynomial in- 
terpolations and projections also depend on the input quantities and influence 
the dependent variables and thus correspond to active variables. Their asso- 
ciated gradients will then also contribute to the finally computed gradients. 
In this case, the correction (10) may then not be sufficient. Thus, if an un- 
known new integration algorithm has to be differentiated, one should carefully 
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Fig. 4: Handling of the final time 



study which numerical artifacts impact the final solution and correct for their 
infiuence in the AD-generated derivatives. 



5 Conclusions 

In this paper it has been shown that Algorithmic Differentiation techniques 
can be successfully applied to numerical integration algorithms by taking 
into account the infiuence of numerical artifacts such as numerical stepsize 
control in an a-posteriori correction of the AD-computed results. Often, AD- 
computed results will even be correct without any manual correction. Thus, 
AD allows to compute gradients for solutions of many initial value problems 
without manually writing a large amount of code. 

However, it must be emphasized that at least a basic understanding of the 
problem formulation and the numerical algorithms is required to avoid subtle 
pitfalls and sources of errors arising from the discrepancy of the algorithm 
arrived at the solution and the mathematical formulation of the solution. High- 
level, but in-depth, knowledge of the underlying algorithm is necessary to 
account for internal (numerical and physical) dependencies to obtain correct 
results. The impact of the computation of the final time step on the overall 
gradients showed the criticality of this issue. 

At the moment, due to the relative novelty of general-purpose AD tools 
for Fortran and C, the discovery of these internal dependencies may not be 
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trivial, but this situation is likely to improve as algorithm developers realize 
the advantages of automatically deriving a sensitivity-enhanced version of 
their code, and document the issues relevant for automatic differentation. 
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Refinement Issues in Stochastic 
Multistage Linear Programming^ 



K. Frauendorfer and Ch. Marohn 

University of St. Gallen, Institute of Operations Research, Holzstrasse 15, 
CH-9010 St. Gallen 



Summary. Linear stochastic multistage programs are considered with un- 
certain data evolving as a multidimensional discrete-time stochastic process. 
The associated conditional probability measures are supposed to depend lin- 
early on the past. This ensures convexity of the problem and allows appli- 
cation of bary centric scenario trees. These approximate the discrete-time 
stochastic process, and provide inner and outer approximation of the value 
functions. 

The main issue is to refine the discretization of the stochastic process 
efficiently, using the nested optimization and integration of the dynamic, 
implicit ely given value functions. We analyze and illustrate how errors evolve 
across nodes of the scenario trees. 

Keywords, approximation, discretization, refinement scheme 



1 Introduction 

In various applications decision makers are required to take uncertain fu- 
ture into account. In particular, today’s decisions have to be taken without 
knowing prices or resources. 

Let the uncertain evolvement of data over a finite planning horizon [0, T] 
be described as a multidimensional discrete-time stochastic process = 
0, ... ,T) on a common Borel space with C IR^ compact. In 

many problem statements both prices and demand or supply of resources 
are stochastic and induce specific structural properties of the value functions 
involved. This motivates us to decompose the multidimensional discrete-time 
stochastic process (cj^ , t = 0, 1, . . . , T) into two stochastic parts, one referring 
to prices = 0, 1, . . . ,T), the second = 0, 1, . . . ,T) to demand or 
supply of resources, i.e. Ut = 

Let P represent the (regular) joint probability distribution of cd := (cjq, . . . , 
cjt). The associated regular conditional distributions with respect to ujt are 

^Research of this report was supported by Schweizerischer Nationalfonds Grant Nr. 21- 
39 ’ 575.93 
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denoted Pt{-\ut-i) for t = 1, ... ,T. All these have compact support. The 
time points t = 0, 1 , . . . ,T at which decisions ut G are taken, are sup- 
posed to be predefined. Setting rj := {tjq,... ,t)t) and ^ := (6 , . . . ,^t), a 
corresponding mathematical program is written formally as 

min Iq [eLo dPiv, 0 

s.t. /t(ut-i,wt) < /i(6), t = 0,l,...,T. 

The convention is that negative subscripts of variables indicate decisions 
of the past, negative subscripts of the stochastic data indicate data of the 
past, both of which represent the input data at the present stage t = 0; in 
particular, stochastic data with subscript 0 are currently observed data and, 
hence, deterministic. 

In (1), the cost pt{-) at t are determined by observation rjt and deci- 
sion Ut. The feasibility region for ut G is supposed to depend on ut~i 
and on the observed outcome *) is a vector- valued function in ut 

and represents the demand for resources, the components of are the 
supply components of the resources at t. Decisions ut have to be selected 
at t after is observed, but prior to the observations (r/^+i,... ^tjt) 

and (6-1-1 , . . . ,^t)- According to this rule, nonanticipative or measurable de- 
cisions have to be determined, which minimizes the expected value of the 
overall cost and which satisfies the constraints. 

The feasibility set, viewed as a constraint multifunction in 6 is supposed 
to be strictly nonanticipative and convex compact-valued with a nonempty 
interior for every 6 This ensures that for any nonanticipative and fea- 
sible decision ut there exist interior feasible and nonanticipative decisions 
ut+i,... ,UT for any sequence of outcomes ,^t) (see Rockafellar 

and Wets 1976/1978 [45], [46] and Prauendorfer 1996 [23]). 

Standard dynamic programming arguments yield optimal value functions 
0f(-) corresponding to periods t = 0, . . . ,T (see e.g. Bertsekas 1995 [2], [3]). 
This allows to write the stochastic multistage programm as a sequence of 
nested two-stage programs. Start at period T with 0 t_|_i(-) := 0 and define 
backwards for t = T, . . . ,0: 

4>t{ut-i,T]t,^t) ■■= mmpt{ut,T]t) + (j)t+iiut,u}t+i)dPt+i{uH+i\rit,^t) 

s.t. ft{ut-i,ut) <h{^t). 

( 2 ) 

In case pt{ut,rft) are continuous convex-concave saddle functions, ft{-) is 
convex vector- valued, ht{-) is linear-affine in 6? and in case the conditional 
probability distributions Pt+i(-|r/t,6) depend linearly on (77^,6) and are un- 
affected by the decisions taken, then it has been proven that for ^ = 1, . . . , T 
the value functions 77^,6) are continuous saddle functions, convex in 

(uf_i,6) and concave in rjt with respect to their domains. Under these con- 
ditions, referred to as the convex case below, primal and dual solvability of 
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the convex programs is ensured in (2). For details it is referred to Rockafellar 
and Wets 1976 [45], Frauendorfer 1994/1996 [22] and [23]. 

For the ease of exposition it will be helpful to use for (2) the notation 

:= min [pf + Ef0i+i](uf, 77^,6). (3) 

One major challenge with (1) is the nested minimizations and multidi- 
mensional integrations of implicitely given value functions. Unlike many 
control type formulations, no analytical expressions can be expected within 
the stochastic multistage setting due to the non-smoothness of the value 
functions. For overcoming these difficulties numerically in the convex case 
we discretize the stochastic processes with respect to their outcomes taking 
into account the subdifferentiability of the value functions. 

Let the conditional probability measures Pt{'\ut-i) be successively dis- 
cretized for t = 0, • • • , T, yielding discrete conditional probability measures 
Qt{'\oJt-i) with corresponding support This way, a scenario tree 

A and the associated path probabilities Q{u) are given according to 

A:—{cj£ 0>\cot € Vt > 1}, . . 

QM •= nLi 

Let the projections of A onto [0,t] be denoted A^. Clearly, any stochas- 
tic process, which is discrete in both time and state, is representable as 
scenario tree. It is easily seen when the conditional probability measure 
^t) is discrete with finite support, that the stochastic two-stage 
programs in (2) has block structure. Then (1) may be written as a math- 
ematical program with a dynamic block structure and high sparsity, whose 
size depends on the number of scenarios within the tree. This indicates why 
the solvability of stochastic multistage programs strongly benefits from so- 
phisticated algorithms. Recent works include Rockafellar and Wets 1991 [47], 
Wets 1989 [51], Gassmann 1990 [27], Birge 1985/1994/1995 [5], [7], [6], Mulvey 
and Ruszczynski 1995 [38], Mulvey, Vanderbei and Zenios 1995 [40], Zenios 
1991 [52], Robinson 1991 [44], Ruszczynski 1993 [49], [48], Dantzig 1990/1993 
[13], [14], Edirisinghe and Ziemba 1994/96 [18], [17], [19], Kail, Ruszczynski 
and Frauendorfer 1988 [32], Manger 1992/1994 [30], [31], Ermoliev and Wets 
1988 [20], Kail and Wallace 1994 [34], Kail and Stoyan 1982 [33], Wets 1989 
[51], Hiller and Eckstein 1994 [29]. 

However, one has to be aware of the fact that discretizing the conditional 
probability measures is a simplification of the real dynamics, which may have 
severe impact on the goodness of the surrogate problem. It is not difficult to 
define conditions based on which the discretization has to be refined, to ensure 
weak convergence of the discrete measures to the real probability measure 
and, hence, epi-convergence of the minimizers. But, due to the fact that the 
number of scenarios grows exponentially, the quality of discretizations has to 
be monitored carefully. For example, applying Monte-Carlo simulation to the 
conditional probability measures with a sample size of 10^ in each of 6 stages 
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results in a scenario tree with 10^® possible scenarios. Suffering from the 
curse of dimensionality the corresponding deterministic equivalent program 
is numerically unsolvable. In the convex case, harycentric approximation 
allows design distinguished scenario trees that provide information on how 
accurate the real dynamics are approximated. 

We discuss issues on how to refine the scenario trees taking into ac- 
count the nested optimization and integration of the value functions. Hence, 
this work provides improvements within algorithmic procedures that solve 
stochastic multistage linear programs. 

The paper is organized as follows: Section 2 introduces a multistage fi- 
nance problem and releases its structural properties, which may be exploited 
within the refinement process. Section 3 reviews briefiy harycentric scenario 
trees and the approximation of the value functions which help overcome 
the difficulties with nested minimizations and multidimensional integrations. 
Section 4 represents the major part of this work and investigates how the er- 
ror evolves along the scenario tree backwards in time, and measures to what 
extent error arises from integration. Section 5 concludes. 

2 Application in banking 

An example from banking is given to illustrate the application of multistage 
stochastic linear programming. Besides linearity, further structural proper- 
ties are inherent in many financial problem statements and, hence, facilitate 
their solvability. Herein, we consider funding of non-fixed rate mortgages 
which face interest rate and prepayment risk. This constitutes an important 
problem for corporate financial managers of Swiss banks due to the risks 
these managers are exposed to. 

Non-fixed rate mortgages offer the clients to finance their mortgages at 
current market rates which strongly correlate with the current interest rate 
level. It should be noted that the national bank not only monitors the mort- 
gage rate but also defines some cap in case the market interest rates are too 
high. This is intended to protect the clients and, on one hand, reduces the 
credit risk for the bank, on the other hand, additional risk is faced with re- 
spect to the funding mechanism. In Switzerland, it happened over a rather 
long period within the last decade that the mortgage rate the clients had to 
pay was even less than the one-year-borrowing rate at which the bank funded 
a considerable part of their business. In such an interest rate environment 
the clients clearly prefer the variable rate mortgages. This increases the vol- 
ume and the funding costs and therefore the risk the bank is exposed to. If 
the interest rate level is low, it has been observed that the mortgage rate is 
kept above a certain floor, causing fixed-rate mortgages to fall even below the 
non-fixed ones. As a natural consequence, clients change their liabilities to 
the fixed-rate mortgages, causing the non-fixed rate volume to decrease in an 
environment where banks would have the possibility to fund their non-fixed 
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rate business at low costs. This type of risk is known as prepayment risk. 
S iimnna.ri7.ing these dynamics from the view point of banking industry, one 
observes that the profitability of non-fixed rate mortgages suffers from a high 
interest level as well as from a low interest level. 

The challenge of funding non-fixed rate mortgages is seen in optimizing 
the monthly funding activities taking into account the stochasticity of interest 
rate dynamics and mortgage volume. Depending on the asset and liability 
structure of a bank, the non-fixed rate mortgages are funded to a certain 
extent with bonds of different maturities. The range of these maturities may 
vary between one month and 10 years. Taking into account the liquidity with 
respect to which the various maturities are traded, it is observed that short 
term bonds, say bonds with a maturity of up to 1 year, may be borrowed in 
a considerably larger volume than bonds with a maturity beyond 2 years. In 
addition, the short term rates are much more volatile than long term rates. 

The funding problem, referred to below, has been introduced in Prauen- 
dorfer 1996 [24] and is discussed with respect to interest rate models in 
Frauendorfer and Schrle 1997 [26]. Herein, this problem is investigated with 
respect to its solvability. 

Let V := {1,2,... ,D} be the set of prescribed times at which bonds 
mature. Taking into account the liquidity within the various maturities, only 
a subset C V is regarded for funding. Furthermore, it is assumed that 
the bonds are held until maturity, so that changes in the price of a bond 
during the holding period may be relaxed. The volume of bonds borrowed 
at time t = 0, . . . ,T with maturity deV is denoted vf represents the 
total volume of bonds with maturity d at time t. Clearly, vf is determined 
by 



vf = vf+l + t = 0, 1, . . . , T; Vd € 
vf = vf+f i = 0,l,... ,T; 

The total funding volume at t is given with 

xt = t = 0, . . . ,T. 

Its evolvement over time is determined with the stochastic change € IR, 
xt = xt-i+^t t = l,2,... ,T. 

The term structure dynamics may be represented by a finite-dimensional 
discrete-time stochastic process rft € lR^;t = 0, . . . ,T. Let the accrued in- 
terest payments of an unit borrowed with maturity d be denoted pt{r]t,d). 
Setting vt := (vf-,de'D),vf' := (uf’'*'; deV^) allows for writing the accrued 
interest paymenets as inner product within IRI® 

< Ptim),vt >= XI Pt{r)ud)-vf'^ 

dev^ 
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The stochastic multistage linear program which minimizes the expected present 
value of interest payments reads as 



“in / 5^ < pt{T]t),Vt > dP{r),0 

t=o 





= 0 






= 0 






= 0 


yt 


dev 








= 6 


Vi 


> 0; vf, xt nonanticipative 




wt,ydev. 



Observe that the left-hand side of the constraints (5) is deterministic. This 
fact requires that the stochastic interest payments, that clearly depend on 
the funding decision, has to be incorporated in the dynamis of To preserve 
convexity one has to relax the stochastic dependency of the volume change on 
the decisions. Then probability measures must be unaffected by the decisions 
taken. The stochastic interest payments appear in the objective, only. If, in 
addition, the conditional probabilities depend linearly on the 

observation in period t - 1, the convexity of the stochastic multistage program 
(5) is preserved and the value function of stage t associated with (5) are saddle 
functions concave-convex in (see Prauendorfer 1994/1996 [22], [23]). 

The underlying saddle structure of the value functions motivates the ap- 
plication of the bary centric approximation (see Figure (1) and section 3), 
which optimizes the discretization of the stochastic interest rate and volume 
processes. 




Figure 1: Bilinear approximation of the value function 
An illustration of a scenario tree is given in Figure 2. 
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Figure 2: Scenario tree for 2-stage funding problem with interest rate curve 
being driven by three maturities. 



It is further noted that +1 and —1 are the only non-zero coefficients of 
the matrix on the left-hand side of (5). The submatrices associated with the 
various stages t = 1, . . . ,T remain unchanged and are of high sparsity. In 
addition to the low dimensions of the term structure representation, the dy- 
namics of the stochastic right-hand side is characterized by a one-dimensional 
discrete-time process. All these properties increase the solvability of the un- 
derlying problem considerably, as these are exploited by sophisticated algo- 
rithms. 

Due to the achieved progress in the methodological developments within 
mathematical programming stochastic programming has received increasing 
attention in finance. At this place it is referred to the successful and valuable 
contributions of Dantzig 1990/1993 [13], [14], Ziemba 1975/1986/1992/1994/ 
1997 [57], [36], [55], [9], [56], [42], Dempster 1997 [11], Klaassen 1997 [35], 
Dupacova 1991/1992/1997 [15], [16] [1], Mulvey 1992/1994/1996/1997 [41], 
[37], [39], [56], [42], Zenios 1991/1992/1993/1995/1996 [52], [53],[54], [55], [12] 
[28], [50], [43], Hiller and Eckstein 1994 [29], and Carifio et al. 1994/1995/1997 
[9], [10], [8j. 
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3 Barycentric scenario trees 

In the dynamic formulation (2), 

J (6) 

has to be evaluated within the nested optimization, which encounters a se- 
rious challenge. As said, in the convex case the value functions are saddle 
functions in (1). This motivates the application of the barycentric approxima- 
tion technique^ which not only helps carry out the nested optimization, but 
also provides information on how accurate the real dynamics are mapped. 

In this section, the barycentric approximation technique is shortly re- 
viewed, illustrating how the multidimensional integration is bounded from 
above and below. For details it is referred to Prauendorfer 1992/1996 [21], 
[23]. 

Since supports are compact, the stochastic outcomes ut := may be 

covered by a x -simplex^ Clt := ©t x St for t = 1, . . . , T. Taking into account 
the dependency of (ryt>6) one may select the x-simplicial 

coverage conditioned on cjt-i = := ©t(a;t_i)xSt(a;t_i). 

To a given scenario tree A of the form (4) and its projections for 
t = 1, . . . ,T, a x-simplicial partition of the support of ut may be selected 
for any ut-\ G At_i(a;t- 2 ): 

= nt(wt-i) D supp Pt(-lu>t-i), 

= 0,Vt't ^i", it = 1,... ,/t}. 

(7) 

We call the resulting set process C := {Ct{uJt-i)]t = 1, . . . ,T) a x-simplicial 
coverage process consistent with A iflF At{ut-i) C fit(a;t_i) for all ut-i G 
At-i(^t-2)j t = 1, . . . ,T. 

For the ease of exposition, we first consider a partition Ct{uJt-i) con- 
sists of one x-simplex, i.e.: Ct{(jOt-i) •= •= x 

©t(a;t_i),Si(a;t_i) are regular simplices in IR^, IR^ with associated vertices 
and 6^,^, i/ = 0, . . . ,if, /i = 0, . . . ,L. Below, and denote 

the barycentric weights of rjt and with respect to the simplices 0^(cjt_i) 
and 

The conditional probability measure Pt{-\ut-i) induces mass distribution 
Mt,v with corresponding generalized barycenter 




on the (L-dimensional) simplices {at^u} x For i/ = 0, . . . let 

x-simplex is a set whose closure is representable as a Cartesian product of simplices. 
^In the notation it is omitted for simplicity that the vertices and generalized bary centers 
depend on uJt-i- 




313 



the mass 



be assigned to As for i/ = 0, . . . , X the mass distributions Mt,u 

add up to a conditional probability distribution, one gets a discrete condi- 
tional probability measure Q[{-\ut~i) on 0^(o;^_i) x Et{uJt-i). 

Due to symmetry, the conditional probability measure Pt{'\uJt-i) induces 
mass distributions with associated generalized bary centers 



•— Z)i/ ^t,v 






( 10 ) 



on the (^^-dimensional) simplices Qt{ut-i) x {6^,^}. Again, for /x = 0, . . . ,L 
let the mass 



^^*,^(©<(^^<-1) X := J >^i^{rit)dPt{vt,^t\i^t-i) (11) 

be assigned to the points bt,^). Analogously, for /x = 0, . . . , L the mass 
distributions Mt,^ add up to a conditional probability distribution, yielding 
a discrete conditional probability measure Qf{-\ut-i) on 0t(ct;i_i) xSt(a;t_i). 

As it becomes obvious from (8), (9), (10) and (11), the advantagous feature 
from a computational viewpoint is that generalized barycenters and their 
probabilities are completely determined by the first moments of rjt and 
and by the cross moments E{rjt • 

It has been proven in Prauendorfer 1992 [21] that in the convex case the 
expression in (6) is approximated from below and above by 

< (Et0f)(u(_i,W(_i) J (j>t{ut-i,wt)dPt{ujt\u)t-i) < (12) 

< (E“<^t)(n<-i,wt_i) := /(/>t(ut_i,W()dQJ‘(a;t|tt;t_i) 

This technique has been termed barycentric approximation and can easily be 
applied to a partition of the form (7). For details it is referred to Prauendorfer 
1992/1994/1996 [21], [22] and [23]. 

Let be defined 



A[{u)t-i) := supp Q[{-\ut-i), 
Afiut-i) := supp 



(13) 



Clearly, Aq = Aq = u>o. Starting with a partition C[{uo) = Ci(u!o), 
the associated barycentric approximation yields Q[{u}o) andQ“(wo) and the 
associated supports A\{u}o) and A“(a;o)- Hence, the partitions Cf{u}t-i) 
and Cf{ut-i) are selected with respect to the support of Ql{-\ujt-i) and 
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Qf{'\uJt-i) in an inductive manner. This way, the barycentric scenario trees 
and the associated path probabilities are given in analogy to (4): 



= G A\{^U), Vt > 1}, 


(14) 




(15) 



t=i 



and 



:= e vt > 1}, 

T 

W“) 

t=l 



(16) 

(17) 



denotes the projection of A\ onto [0,t], and = (/3 q, . . • 

associated barycentric scenarios. Note that due to 
the construction the barycentric scenario trees, A\ A^ and the associated 
x-simplicial coverage processes C\ are consistent. 

Substituting in (1) the probability measure P by its discretizations 
and yields the associated stochastic multistage programs 



min / 

s.t. ft{ut-i,ut) <h{^t), t = ,T, 



(18) 



and 



min / Tj=oPti‘^t,Vt)]dQ'^{r],^) 

s.t. ft(ut-i,ut) <h{^t), t = 0,l,... ,T. 



(19) 



The associated value functions are given through the dynamic formulation 



V’t(wt-i,»?t,Ct) := T^inpt{ut,rit) + J ■ilJt+i{ut,0Jt+i)dQ[{u}t+i\vt,^t) 

s.t. /t(«t-i,tit) < M6). 



( 20 ) 



and 

, m, 6) := min pt{ut, rjt) + f ^t+i {ut,ujt+i)dQ^{u}t+i |r?t, 6) 
s.t. ft{ut-i,ut) 



( 21 ) 



with ipT+i{') = ^T+i(0 := 0. 

It has been proven in Prauendorfer 1994 [22] that (20) and (21) provide 
lower and upper approximates for the value function with 

V>t(«t-i,»?«,6) and $t(ut-i,r/t,6). i-e: 



V’t(wt-i,»?«,6) < <At(wt-i,»?«,6) < ^t(ut-i,J?t,6)- 



( 22 ) 
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Solving (18) and (19) yields policies v!' := , ulj) and := 

(uq , Wi , • . • , where uf denote the decisions made after e 
G (t = 1, . . . , T) is observed. Hence, upper and lower approximations of 
the value functions may correspond to different policies. Both and $t(-) 
epiconverge to 0t(-) in case the weak convergence of the conditional discrete 
probability measures Qt(-|/3|_i), <9“(-|/3“_i) to Pt{-\0i-i)>Pt{W-i)y respec- 
tively, is ensured for t = 1, . . . ,T. This requires that the sub-x-simplices of 
the coverage processes become arbitrarily small with respect to their diam- 
eters. In the following section we discuss in what way this convergence can 
be monitored, taking into account that lower and upper approximation refer 
to different policies and different coverage processes. 

4 Error analysis 

As mentioned in the introduction, the difficulty in solving stochastic linear 
multistage programs is seen in the nested optimization and multidimensional 
integration of implicitely given value functions. Discretizing the conditional 
probability measures in the barycentric sense helps to overcome these difficul- 
ties and, in the convex case, provides approximate policies for the underlying 
problem including bounds on the optimal expected value. However, lower 
and upper approximation of the value functions refer to different scenario 
trees and associated consistent x-simplicial coverage processes, which only 
coincide with respect to the data at t = 0. For assessing how the inaccuracy of 
the approximation evolves over time, based on which the refinement process 
can be monitored efficiently, additional information has to be determined. 

For the ease of exposition, the following notations are used: 

. , , (23) 

(E(V’t)(ut-i,wt_i) := j 



Let the scenario trees A^, A" and the associated consistent coverage pro- 
cesses C*, C7“ be and the associated stochastic multistage programs (18) and 
(19) be solved. Given a scenario . . ,/3l) £ A*’'^ up to t, decision 

u[ and its value Pi) is available. The stochastic evolvement during 

the next period is approximated by the conditional barycentric discretization 
G A[^i{Pl) with the associated minimal value ^t+i(wi,/3t+i) (see Figure 
3); analogoulsy, for G the decision and the value ^t(u“_i,/3„) is 
known. The stochastic evolvement during the next period is approximated by 
the conditional barycentric discretization G (/8“) with the associ- 
ated minimal values ^t+iiutiPt+i) (see Figure 4). Due to (22), the following 
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Figure 3: Scenario tree corresponding to the lower approximation 




Figure 4: Scenario tree corresponding to the upper approximation 
inequalities hold : 

(24) 

(25) 

The missing upper bound for P\) is available with the evaluation 

of the missing lower bound for Pt) can be obtained 

with /3t)- In both cases, a (r-t)-stage multistage program must be 

solved with respect to the bary centric scenario trees conditioned on /3|, 
respectively, providing the inner approximation, the outer approximation, 
respectively. Due to the convexity of the value function (t>t in the decision ut^ 
we focus on the minimizer of the outer approximation: This becomes evident 
from Figure 5: Evaluating the inner approximation at the minimizer u[ of 
the outer approximation provides a further bound on i.e.: 

< [Pt + Er$t+i](«l,/3l) (26) 

Note that u\ solves (20) and is feasible for (21) given and (see Figure 

6 ). 

To the contrary, the value of the outer approximation at the minimizer 
uf of the inner approximation does not necessarily bound 0t(*) from below 
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Figure 5: Lower and upper approximation for pt -f 




Figure 6: Evaluation of the upper approximation corresponding 

to each node of the scenario tree A^. 

(see Figure 5), i.e.: 

[Pt + EjV’t+i] «,/?*“) t A“) = min[/9t + (27) 

Note that solves (21) and is feasible for (20) given and 

Above observations let us focus on the barycentric scenario tree which 
corresponds to the lower approximation (see Figure 7), for which immediately 
error bounds can be obtained at any node of that tree by 

et(/3|) := (28) 

The evaluation of the error at each period provides a measure for the 
goodness of the lower approximation (see Figure 8). If the error at some 
node in period t is zero, the approximation of the value function 0t(-) and 
their minimizers is exact and does not need to be improved beyond this node. 
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Figure 7: Information known in period t 





Figure 8: Error bounds at the nodes of 



Clearly, in case that et{Pl) is positive at some node, than the error bounds 
prior to that node are positive, too. 

By definition €t+i = 0. Moving backwards in time, the error bounds 
increase. The integration error occurs first and, hence, causes increasing 
error within the nested minimization and integration procedure. We shall 
investigate here how the error bound et(-) evolves. For this purpose, we 
summarize the inequalities (24) and (26) according to 

[/Jt + E\tl}t+i]{u[,l3l) = ^t) < 

< = [pt + < (29) 

< [pt + E^%M,0D 

where solves (21) given and /Sj. Obviously, 

\pt + EjV’t+i](u<)^t) < [pt + Ej'ft+iKuJj/Jj), (30) 



however 



\pt + E[%+i]{u[,l3i) ^{pt + E^%+i]{u\,l3l), 



( 31 ) 
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as the approximate value function $t+i( ) need not necessarily be a saddle 
function. For the evaluation of [Ej^t+i](U(,y8() and it is 

referred to Figures 9 and 10. Therefore, 




Figure 9: Evaluation of [E“$f+i](U(,;3t) 





Figure 10: Evaluation of [EJ$(+i](u[,;3{) 



6t{fy := - [E[%+i]{u[,l3l), (32) 

may also become negative. Accepting St{l3l) as error estimate associated 
with the integration of (see Figure 11), we conclude that in case 

Sti/Sl) ^ 0, the current discretization has caused an inaccurate integration 
of ^t+i(‘) and, hence, of 0 i 4 -i(-)- In case that St{Pl) is even negative, the 
current approximation of the saddle function 0t+i(-) by cannot be 

accepted as sufficiently accurate, as ^t+i(-) does not even satisfy the saddle 
property. 

We derive in what way St{Pt) contributes to et{Pl). This comes immediate 
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Figure 11: Error increments at the nodes of 



from 

< [pt + Er%+i]{u[,fy - [pt + Ei^t+iH,l3‘) = 



= E^%+i{u[,l3i)-E[i>t+M,l3l) = 



= E^%+M,^1) - + E[^t+M,0l) - ^^t+M,0l) = 



= St{0i) + E ■ QUM+i\0l) =■■ M0D- 

^<+1 ^-^<+1 (^i) 



(33) 



According to the above relation, 6t{/3l) is an upper bound for the error 
increment from period t 4- 1 to period t conditioned on P[. Note that 

eM) = m) + E et^i{0Ui) • QUiiPWM) = Mm (O.. 



holds in case u[ is a minimizer of [pt + This implies that 

the error increment is due to integration of If, additionaly, 

bilinear, then St{Pt) = 0 ^ind 

^t{Pi) ^ ‘ Qt+i(^t+il^t) = /Q5\ 

0U^eA[^M) 

The above relations are illustrated on a 3-stage funding problem for the 
ease of understanding. 

The optimal value of the lower approximation is 5906.92 and of the upper 
approximation is 6694.96, the accuracy is 11.77% (see table 1). Three refine- 
ments have been performed which have decreased the relative error to 5.81%. 
It is observed that the upper bound remains unchanged with respect to these 
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# ref. 


upper bound 


lower bound 


accuracy 


0 


6694.96 


5906.92 


11.77 % 


1 


6694.96 


6268.61 


6.37 % 


2 


6694.96 


6269.41 


6.36 % 


3 


6694.96 


6305.73 


5.81 % 



Table 1: Lower and upper bounds for the first 3 refinements 



refinements. This is due to fact that in the underlying funding problem the 
inner approximation ^t{') of the value function has been bilinear over 
t = 2,1, 0, for which case the nested integration and minimization of $t(-) is 
exact, and, hence, the upper bound and the corresponding minimizers remain 
unchanged. 




Figure 12: Evolvement of the error et{/3i,) backwards in time 

The evolvement of and At{/3l) is illustrated in figures 12, 

13, and 14. Nodes in 12 at which ct = 0 indicate that inner and outer 
approximation of the value functions coincide and are bilinear. Nodes in 13 
at which St = 0 indicate that the inner approximation is bilinear. Nodes at 
which ct = At indicate that the error increment from stage t -\-l to t is due 
to inaccurate integration of the inner approximation, with no impact on the 
minimizer of the inner and outer approximation. This way, critical nodes 
may be assessed beyond which the corresponding coverage process should 
be refined. Given a node of conditioned on which the partitions are 
refined, one is faced with the two-stage situation. Theoretically, the various 
refinement schemes that have been developed by Kail and Stoyan 1982 [33], 
Birge and Wets 1986 [4], Prauendorfer and Kail 1988 [25], Kail, Ruszczyhski 
and Prauendorfer 1988 [32], Edirisinghe and Ziemba 1994/1996 [18], [17], 
[19], are applicable. However, note that contrary to the two-stage stochastic 
programs, the recourse function of the nested two-stage formulation within 
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Figure 14: Error bounds A^_i(/3|_i) of 3-stage scenario tree of lower approx- 
imation 



the multistage problem is only approximately available by the implicitely 
given inner and outer approximation. This fact certainly has to be taken 
into account and requires further investigations. 

5 Conclusions 

This work contributes to the solvability of stochastic multistage linear pro- 
grams, which suffers from the nested optimization and multidimensional in- 
tegration of implicitely given value functions. In the convex case, which holds 
if the conditional probability distributions depend linearly on the past and 
remain unaffected by the desicions taken, structural properties help to over- 
come numerical difficulties to a certain extent. In particular, applying the 
bary centric approximation technique yields distinguished scenario trees. The 
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solution of the associated deterministic equivalent programs provides lower 
and upper bounds of the minimal expected costs given the entire planning 
horizon [0,T]. The approximate policies refer to different scenario trees and, 
hence, may differ so that the inaccuracies beyond t > 1 cannot be assessed 
immediately. The inner and outer approximation of the value function are 
implicitely given and may only be compared as long they refer to the same his- 
tory. Due to the convexity of the value function with respect to the decisions, 
it has been realized that the outer approximation ensures the minorization of 
the minimum values subject to the stages t = 0, 1, . . . , T. This has been the 
basis to focus on the corresponding scenario tree and, then to evaluate 
the upper bound with respect to each node of that tree. This way the error 
can be assessed with respect to any history within AK It has been observed 
that the error caused by integration of the inner approximation is mainly 
responsible for the error increments backwards in time. For determining the 
critical nodes beyond which the approximation should be refined, relations 
have been derived which characterize the total error at stages t, 1 4- 1 and the 
integration error that arises from t -{• I to t. Given a history (i.e., a node) of 
conditioned on which the coverage process has to be refined, one is faced 
with the two-stage situation. However, for applying the various refinement 
schemes that have been developed for stochastic two-stage programs, it has 
to be taken into account that the recourse functions of the nested two-stage 
formulation within the multistage setting are only approximately available by 
the implicitely given value functions of the surrogate problems. This certainly 
opens further research activities which hopefully help increase the solvability 
and, hence, the applicability of stochastic multistage programs. 
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On Solving Stochastic 
Linear Programming Problems 



P. Kall*and J. Mayer 

lOR, University of Zurich, Moussonstr. 15, CH-8044 Zurich 



Abstract. Solving a stochastic linear programming (SLP) problem involves 
selecting an SLP solver, transmitting the model data to the solver and re- 
trieving and interpreting the results. After shortly introducing the SLP 
model classes in the first part of the paper we give a general discussion of 
these various facets of solving SLP problems. The second part consists of an 
overview on the model-solver connection as implemented in SLP-IOR, our 
model management system for SLP. Finally we summarize the main features 
and capabilities of the solvers in the collection of solvers presently connected 
to SLP-IOR. 

Keywords. 90C15 (1991 MSC) 

1 Stochastic linear programs 

In this section we briefiy summarize the stochastic linear programming (SLP) 
model classes which will be considered in this paper. For a detailed intro- 
duction see Kail [11], Kail and Wallace [19] and Prekopa [28]. 

SLP with fixed recourse 



( 1 . 1 ) 



where 



( 1 . 2 ) 



min{c^x + Eu;Q{x,u)} 

< s.t. Ax oc b 

X e [/,u], 

Q{x,cj) = min q^{uj)y 
i s.t. Wy (X h{(jj) — T{lj)x 

y > 0 . 



The symbol a stands for any one of =, <, > , row- wise. The following classes 
of recourse problems will be considered: 
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— W (arbitrary) fixed recourse. 

— W complete fixed recourse, i.e. {z \ z = Wy^y > Gi) = and 
{u I W^u < q{uj)} / 0 w.p. 1. Note that these assumptions imply that 
the recourse problem (1.2) has a feasible solution Vx Vo; and an optimal 
solution for Vx w.p. 1. 

— W simple recourse, i.e. W = (/, —I) and T{uj) = T, q{uj) = q. In this 
case the expected value term in the objective of (1.1) becomes separable 
w.r. to the components of x = Tx. 

In the model above u £ Cl, (Cl,!F,P) is a probability space; q{u),h{cj) are 
random vectors and T{u) is a random matrix. These stochastic parts are 
assumed to be given by the following affine relations: 
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with . . . ,^r(^) being random variables with a known joint probability 

distribution. The stochastic independence assumption for this type of models 
will mean the stochastic independence of (a;), ... , ^r(^)- 

Under mild assumptions the SLP problem with fixed recourse (1.1) is a convex 
programming problem, see e.g. Kail and Wallace [19]. 

SLP with a joint chance constraint 



(1.4) 



• T 
mine X 

P{{uj I Tx > ft(tt;)}) > a 
Ax oc b 

. X e [l,u], 



with 0 < a < 1 being some (high) probability level. The probability distri- 
bution of the random vector h{u) is assumed to be known. 

Problem (1.4) is also called an SLP model with a probabilistic constraint. 
We gave the model formulation with a single probabilistic constraint because 
this case will be considered later on. Let us notice however that the general 
formulation may involve several constraints of this type. 
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For a broad class of multivariate probability ditributions with an existing 
density function (including the nondegenerate multinormal distribution) the 
SLP problem with a joint chance constraint (1.4) is a convex programming 
problem, see e.g. Kail and Wallace [19] and Prekopa [28]. 

Notice that problem (1.4) has been formulated with a deterministic technol- 
ogy matrix T. The reason is that under a random technology matrix T{u) the 
problem becomes in general nonconvex even for a multinormal distribution. 

SLP with separate chance constraints 



(1.5) 



min c^x 












> 


ai, Vi 




Ax 


(X 


b 




X 


e 


[Z, ti], 



with 0 < Oi < l,Vi, given (high) probability levels and tj{u) denoting the 
i-th row of T{(j). The joint probability distribution of is as- 

sumed to be given Vi and we also assume that these random vectors are 
stochastically independent. 

Notice that from the purely theoretical point of view (1.5) could be con- 
sidered as a special case of (1.4) with several joint chance constraints and 
random technology matrices T{u). The separately chance constrained prob- 
lem (1.5) is however practically the single numerically tractable subclass of 
jointly chance constrained problems with a random technology matrix. This 
is the reason for introducing chance constrained models as they stand above. 

For certain distributions (including multinormal) and certain probability lev- 
els the SLP problem with separate chance constraints (1.5) turns out to be 
a convex programming problem, see e.g. Kail and Wallace [19] and Marti [23]. 

All three model classes involve in general multidimensional integrals. In the 
recourse case an additional difficulty is rooted in the fact that the integrand 
is only implicitly given as the optimal value function of a parametric LP. 
Because of this feature SLP models are numerically hard problems. 

If we replace the random variables in the above models by their expected 
values (assuming their existence), deterministic LP models result. We will 
call them underlying LP^s in the sequel. 
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2 Algebraic equivalents 



In some special cases it is possible to reformulate SLP models as mathematical 
programming problems involving only functions explicitly given by algebraic 
formulas. We call these equivalent MP problems algebraic equivalents in order 
to distinguish them from the so called deterministic equivalents (for the latter 
see e.g. Kail and Wallace [19]). Below we discuss those algebraic equivalents 
which will be addressed later on in the paper. 



Recourse problems, discrete distribution 

For a discrete distribution the expected value in the objective of (1.1) be- 
comes a sum and the optimal value function can be eliminated on the cost 
of introducing an additional vector variable for each one of the realizations. 



Let us assume that (^^,/i^,T^), k = are the joint realizations of 

{q{u)^h{u)^T{uj)) with corresponding probabilities Pk^ k = l,...,iV. The 
algebraic equivalent will be the following LP problem: 



( minfc^a; -f- + 

Ax 

T^x + Wyi 






( 2 . 6 ) { 



T^x + 



= h 
= h^ 

Wy^ =h^ 

> 0 

2/* >0Vi. 



This LP problem has a so-called dual block angular structure. Consider- 
ing r=9 independent random variables the number of diagonal blocks of the 
problem is: 



• N = 3^ = 19683 for 3 realizations for each random variable; 

• N = 5^ = 1953125 for 5 realizations for each random variable. 



Thus the algebraic equivalent LP’s are typically large scale problems having 
a special structure. The fact that such an equivalent LP exists does not im- 
ply that all recourse problems with a discrete distribution can be solved by 
just solving the equivalent LP. As the above example illustrates the size of 
the problem rapidly grows with an increasing number of realizations of the 
components of the random vector and with an increasing number of random 
variables. The size of the problem may easily grow to an extent where it is 
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impossible to generate the equivalent LP not to speak of solving it. 



Separate chance constraints 

In the special case when in (1.5) only the right hand side is stochastic, the 
problem can obviously be reformulated as an LP based on quantiles. 

In the general case for certain multivariate distributions and probability levels 
convex programming algebraic equivalents exist. For the case of multinormal 
distributions and ai > 0.5 see e.g. Kail and Wallace [19]. 

3 Solving SLP problems 

The solution phase plays a crucial role in the modeling life-cycle of SLP mod- 
els, see Kail and Mayer [15]. In this section we shortly summarize the various 
facets of solving SLP problems. In the sequel by a solver we mean a computer 
implementation of a solution algorithm. 

Access to solvers 

For solving an SLP model first of all access to a solver is needed. For mod- 
erately sized recourse problems with a discrete distribution the problem may 
be solved by formulating the algebraic equivalent LP (2.6) and by solving it 
e.g. by a readily available commercial LP solver. The same is true for those 
separately chance constrained problems where an algebraic equivalent exists. 
When only the right hand size is stochastic then again a general purpose LP 
solver can be utilized, otherwise an NLP solver is needed. 

For realistically sized recourse problems and for jointly chance constrained 
problems specialized SLP solvers are needed. The difficulty is that according 
to our knowledge there do not exist commercial SLP solvers and the existing 
SLP solvers are located at various academic institutions. One of the purposes 
of this paper is to provide information how SLP solvers can be accessed. 

Selecting an approriate solver 

Let us first emphasize that unlike in the LP case, in the SLP case there does 
not exist a general SLP solver capable to solve all of the various SLP model 
types. 

The main problem features on which the selection of an appropriate solver 
for a given SLP model instance depends, are the following: The type of the 
model, the fact which parts of the problem are stochastic, the stochastic de- 
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pendency structure and probability distribution of the random variables and 
the dimensionality limitations imposed by the solver and by the computing 
environment. This implies that selecting an SLP solver usually presupposes 
some technical knowlege on solver capabilities. 

Solver input dataformat 

Let us assume that a solver has been selected. The next issue is to transform 
model data into the solver’s input dataformat. Notice that this conversion 
may involve also a model conversion when the solver aims at an algebraic 
equivalent. 

A standard input dataformat, S-MPS, exists for recourse problems includ- 
ing also multistage models, see Birge et al. [1]. This is an extension of the 
well-known linear programming dataformat MPS. An SLP model instance 
can be specified in three text files. The first one serves for specifying the 
underlying LP, the second one for pinpointing the random entries and fixing 
their probability distribution whereas the third file defines the stages. The 
first one of these files is basically an MPS file, for writing it an algebraic 
modeling language like GAMS (see Brooke et al. [2]) can be utilized. 

For solvers not endowed with the capability of reading S-MPS or for solvers 
aiming at models not included into the S-MPS dataformat data must be for- 
matted according to the specific input requirements of the solver. The data 
specification for a solver can become in both cases quite a problem for large 
scale models. 

Considering large scale problems an additional difficulty is to find data er- 
rors: Such problems must usually be debugged like a computer program. 
Debugging is extremely difficult with data being in the solver’s input format 
(including S-MPS). Besides debugging repeated runs also occur when solving 
variants of a model instance. This implies that repeated conversion between 
a “readable” dataformat like a spreadsheet or an algebraic modeling language 
and the solver’s input format should be supported. 

Solver parameters 

As already mentioned SLP problems are numerically hard. SLP algorithms 
either treat them as large scale LP’s or directly face the problem of dealing 
with multidimensional integrals. This implies that solving an SLP problem 
may involve several runs with various settings of the solver parameters (e.g. 
various tolerances); “tuning” the parameters plays an important role. We 
experienced this problem even with SLP solvers which solve the algebraic 
equivalent LP. 
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Selecting an appropriate setting of the solver parameters is especially im- 
portant for the stochastic algorithms; the performance of the corresponding 
solvers largely depends on the parameter settings. It is very important to 
provide some guidance to parameter selection for the users of these algo- 
rithms. 

Specification of solver parameters is either implemented as command line 
parameters or by employing “SPECS” or “OPTIONS” files. The format of 
these files largely depends on the solver, this being true even for commercial 
LP solvers. 

Output of results 

Solvers usually write the solution into an output file which is in most cases 
“readable” meaning that the solution is tabulated for the sake of easy com- 
prehension. For large scale models huge tables arise which must be further 
processed for judging the solution or for analyzing it. This means that the 
user might wish to load the solution into his own working environment for 
further processing. The difficulty is that the diifferent solvers may write quite 
differently formatted output tables, i.e. again data format conversion is in- 
volved which should be automated. 

Assessing the quality of a solution 

As mentioned above solving an SLP problem may involve several runs, and 
it is very important to judge the quality of the current solution. 

Let us consider recourse problems first. For large scale problems it may be 
very costly or even be impossible to compute a single exact objective function 
value. In the case of the stochastic algorithms it is epecially important to 
judge the quality of the solution. Successive discrete approximation methods 
play an important role in this respect: They provide lower and upper bounds 
on the optimal objective value. 

For jointly chance constrained problems the difficulty is to compute the prob- 
ability involved in the chance constraint. In this case Boole-Bonferroni type 
inequalities can be used for the purpose of assessing a solution. 

For the bounds mentioned above see Kail and Wallace [19] and Prekopa [28]. 
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4 SLP-IOR: The solver interface 

This section is devoted to an overview on the solver interface of SLP-IOR, a 
model management system for SLP developed by the authors, see Kail and 
Mayer [13], [14], [16]. Presently we work on a further development of the 
system by including also multistage models. SLP-IOR is freely available for 
academic purposes; the current version is for IBM PC/AT 486 (or higher) 
computers running under MSDOS. 

As we discussed in the previous section one of the main difficulties in deal- 
ing with different solvers is the handling of the various input /output solver 
dataformats. The main idea in the design of the solver interface of SLP-IOR 
is the following: We utilize the well documented solver interface of the alge- 
braic modeling system GAMS, see Brooke et al. [2], for connecting the SLP 
solvers to SLP-IOR. 

A solver run consists of the following steps: 

• SLP-IOR writes the model instance in the GAMS modeling language. 
As GAMS does not include facilities for representing random variables 
we use our own conventions for specifying the random variables data. 

• GAMS reads the model instance and subsequently outputs LP data 
according to the GAMS interface format and random variable data 
according to our format convention. The LP data correspond either to 
an algebraic equivalent or otherwise to the underlying LP problem. 

• In the next step these data are converted by SLP-IOR to the input 
dataformat of the solver. 

• The solver is started up. 

• After solver termination results are converted by SLP-IOR to the GAMS 
format. 

• GAMS reads the results and writes a listing file which documents the 
model instance as well as the run characteristics and the results. For 
the sake of easy retrieval the solution is also written into a separate 
text file. 

• SLP-IOR retrieves the results. 

The single deviation from this scheme is for solvers with input data in S-MPS 
format: For efficiency reasons in this case the S-MPS files are directly gen- 
erated by SLP-IOR. 

An obvious advantage of the outlined approach is that this way a uniform 
interface arises: All solvers have to be interfaced according to the GAMS 
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interface format. Regarding only the uniformity issue admittedly a much 
simpler user interface could be built by defining and using an own solver 
interface format. 

Besides uniformity our approach has however further advantages: This way 
we have immediate access to the powerful general purpose MP solvers of 
GAMS, which can e.g. be used for solving algebraic equivalents for compar- 
ative purposes. Another advantage is that a documentation of the model 
and of the computational results is automatically available in the modeling 
language GAMS. A further reason which does not concern solvers is the fol- 
lowing: SLP models frequently arise as stochastic versions of an underlying 
LP. According to present day modeling standards the user should be provided 
with the important facility to formulate this LP in an algebraic modeling lan- 
guage and import it afterwards into SLP-IOR for subsequently building the 
stochastic variants. We employ GAMS also for this purpose. 

The main features of the solver interface of SLP-IOR can be summarized as 
follows: 

• A wide variety of solvers is connected to SLP-IOR. 

• Selecting an appropriate solver is supported by providing a list of ap- 
propriate solvers for the current model instance. 

• Model data are automatically transformed to the solver’s input format. 

• Setting the solver’s parameters and repeated runs occur in an interac- 
tive menu driven fashion. 

• Solver output results are automatically retrieved e.g. for further analy- 
sis. 

Technical note: All solvers connected to SLP-IOR are instances of a general 
solver class in the object oriented sense. SLP problems are themselves in- 
stances of model classes. Selecting an appropriate solver for a model instance 
is implemented as follows: The solver instances in turn inspect the current 
model instance and send a message to the model manager component whether 
they consider themselves appropriate for solving that model instance. This 
feature facilitates connecting solvers to SLP-IOR. 

5 SLP-IOR: Connecting solvers 

Solvers can be connected to SLP-IOR according to the following categories: 

An external tool is an external software (besides a solver it can e.g. be a text 
editor) which is just started up by SLP-IOR and after its termination control 
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simply goes back to SLP-IOR. Connecting such a tool can be performed in 
a menu driven fashion. 

An external solver is a solver which receives the data of the current model 
instance but after performing its tasks no computational results are returned 
to SLP-IOR {solver is to be understood in this context quite broadly, it may 
e.g. be a model analysis system). This facility can also be used to loosely 
connect an SLP solver to SLP-IOR provided that the solver data input for- 
mat belongs to one of the available formats in SLP-IOR (e.g. S-MPS). The 
connection can again be established in a menu driven way. 

A GAMS solver is one of the solvers of the user’s GAMS system, connecting 
it to SLP-IOR is again guided by menus. This is a close connection, the 
GAMS solvers participate in all system operations in the same way ais the 
internal solvers. 

An internal solver is a solver connected to SLP-IOR in the closest possible 
way. Connecting a solver this way may involve minor changes in the source 
of the solver and sometimes also in the source of SLP-IOR. 

The rest of this section is devoted to discussing the connection of internal 
solvers and is intended for the technically interested reader. 

We only connect solvers to SLP-IOR as internal solvers when the source of 
the code is available. 

One of the reasons for making minor changes in the solver source is measur- 
ing elapsed time. In order to perform reasonable comparative computational 
studies the elapsed time returned by the solvers should have the same mean- 
ing for all solvers. This is unfortunately usually not the case: Some solvers 
measure e.g. preprocessing time or I/O time separately, others just return to- 
tal elapsed time. When comparing solver performance the total elapsed time 
is of interest whereas comparing algorithms requires comparison of elapsed 
time of the solution procedure part. Let us notice that the latter is not unam- 
biguous, sometimes important solver specific transformations in the prepro- 
cessing phase are not measured as part of the solution time of an algorithm. 

Below we summarize the main points concerning changing the source code of 
the solver or of SLP-IOR. 



— Solver input format: If this is not S-MPS, then we first look into 
the solver code to find out whether the data input part can safely be 
changed to read data in one of the already available formats in SLP- 
IOR. If this is the case the change is carried out, otherwise as a last 
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resort the code of SLP-IOR is changed: The list of available datafor- 
mats is augmented by the new dataformat. 

— Solver output: A small piece of code is inserted into the solver source 
for outputting termination status, elapsed time, solution etc. If this is 
not possible because of the complexity of the code this information is 
being cut out from the solution listing (i.e. the code of SLP-IOR must 
be changed). 

— Solver parameters: In SLP-IOR solver parameters can be specified by 
the user before a solver run in an interactive fashion. This is imple- 
mented as follows: In the case when solver parameters are implemented 
as command line parameters these are simply offered for the user for se- 
lection. For solvers employing option files a default option file is offered 
for editing before the run. 

Let us emphasize that for solvers not developed by ourselves for changes in 
the solver source we ask for the author’s permission. 

6 SLP-IOR: Solvers connected 

In this section we first give a list of solvers which are either connected to 
SLP-IOR or which are planned to be connected in the near future. For each 
solver the underlying solution algorithm is listed first followed by the name 
of the solver, its developers and references. For details see the specified refer- 
ences, Kali and Wallace [19] and the following survey papers: Kail [12], Kail, 
Ruszczynski and Frauendorfer [17], Mayer [25], Prekopa [27] and Wets [34]. 

Fixed recourse, algebraic equivalent 

• L-shaped method. Van Slyke and Wets [33], further developed by Gass- 
mann [7]. MSLiP, Gassmann 1992. The solver capabilities include also 
multistage problems. Connecting a new version is in progress. 

• Basis reduction method, Strazicky [31]. 

• Regularized decomposition, Ruszczynski [29]. 

QDECOM, Ruszczynski 1985; a new version DECOMP, Ruszczynski 
and Swi§tanowski [30]. 

• General purpose simplex solvers. XMP, Marsten [21], and the GAMS 
solvers CONOPT, MINOS5, OSL, ZOOM. 

• General purpose interior point solvers: 

BPMPD, Meszaros [26]; HOPDM, Gondzio [8]; (R)OBl, Marsten 
et al. [22] and the GAMS solver OSL. 
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Complete recourse, original problem 

• Successive discrete approximation, Kali and Stoyan [18], Prauendorfer 
and Kail [5], Prauendorfer [4]. DAPPROX, Kail and Mayer 1994. 

• Stochastic Quasigradient method, Ermoliev [3], Gaivoronski [6]. 

• Stochastic decomposition, Higle and Sen [9], [10]. SDECOM, Kail 
and Mayer 1993^. 

Simple recourse, original problem 

• Successive discrete approximation. Kail and Stoyan [18]. 
SRAPPROX, Kail and Mayer 1992. 

• Convex hull method of van der Vlerk [20], for integer recourse. 
SIRD2SCR, van der Vlerk and Mayer 1993. 

Joint chance constraints, multinormal distribution 

• Supporting hyperplane type method, Szantai [32]. PCSP, new imple- 
mentation, Szantai, 1996, (also Dirichlet and Gamma distributions); 
PCSPIOR, Mayer, 1995. 

• Reduced gradient type method, Mayer [24]. PROCON, new imple- 
mentation, Mayer 1995. 

• Central cutting plane type method, Mayer, 1995. PROBALL, Mayer, 
1995. 

In the tables on the next page the main characteristics of those solvers are 
tabulated which are presently connected to SLP-IOR. 

Table 6.1 gives a summary of the solver characteristics. The first column 
shows appropriate model types, the second column indicates model parts 
which may be stochastic (for the denotation here see Section 1). The third 
column shows allowed probability distributions. Although in this column cd 
generally stands for continuous distributions please notice that in the present 
version this means uniform, normal or exponential distributions. The last 
column shows the availability of the code. GAMS in this column indicates 
a commercial GAMS solver, available with GAMS. SLP-IOR stands for an 
internal solver of SLP-IOR which is distributed along with SLP-IOR in an 
executable form. A literature reference in this column indicates that the 
solver is licensed, i.e. for using it a license from the author is needed. 

In Table 6.2 an asterisk indicates that the solver is appropriate for the cor- 
responding model type. This table does not contain possible independence 
requirements for the random variables, for this information see Table 6.1. 

^Implemented with generous support by Higle and Sen. 
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Table 6.1 Solver characteristics 





Models 


St. Parts 


Distr. 


Avail. 


BPMPD 


FR, SC 


T, h, q 


dd 


SLP-IOR 


CONOPT 


FR, SC 


T, h, q 


dd 


GAMS 


DAPPROX 


CR 


T,h 


i, dd, cd 


SLP-IOR 


HOPDM 


FR, SC 


T, h, q 


dd 


[8] 


MINOS5 


FR, SC 


T, h, q 


dd 


GAMS 


MSLiP 


FR 


T, h, q 


dd 


[7] 


OBI 


FR, SC 


T, h, q 


dd 


[22] 


OSL 


FR, SC 


T, h, q 


dd 


GAMS 


PROBALL 


JC 


h 


nd 


SLP-IOR 


PROCON 


JC 


h 


nd 


SLP-IOR 


PCSPIOR 


JC 


h 


nd 


SLP-IOR 


QDECOM 


FR 


T, h, q 


dd 


SLP-IOR 


SDECOM 


CR 


T,h 


i, dd, cd 


SLP-IOR 


SIRD2SCR 


SIR 


h 


dd 


SLP-IOR 


SRAPPROX 


SR 


h 


dd, cd 


SLP-IOR 


XMP 


FR, SC 


T, h, q 


dd 


[21] 


ZOOM 


FR, SC 


T, h, q 


dd 


GAMS 



Table 6.2 Solvers versus models 





FR 

dd 


CR 

dd 


CR 

cd 


SR 

dd 


SR 

cd 


SIR 

dd 


JC 

nd 


SC 

cd 


BPMPD 


* 


* 


- 


* 


- 


- 


- 


* 


CONOPT 


* 


* 


- 


* 


- 


- 


- 


* 


DAPPROX 


- 


* 


* 


* 


* 


- 


- 


- 


HOPDM 


* 


* 


- 


* 


- 


- 


- 


* 


MINOS5 


* 


* 


- 


* 


- 


- 


- 


* 


MSLiP 


* 


* 


- 


* 


- 


- 


- 


* 


OBI 


* 


* 


- 


* 


- 


- 


- 


* 


OSL 


* 


* 


- 


* 


- 


- 


- 


* 


PROBALL 


- 


- 


- 


- 


- 


- 


* 


- 


PROCON 


- 


- 


- 


- 


- 


- 


* 


- 


PCSPIOR 


- 


- 


- 


- 


- 


- 


* 


- 


QDECOM 


* 


* 


- 


* 


- 


- 


- 


- 


SDECOM 


- 


* 


* 


* 


* 


- 


- 


- 


SIRD2SCR 


- 


- 


- 


- 


- 


* 


- 


- 


SRAPPROX 


- 


- 


- 


* 


* 


- 


- 


- 


XMP 


* 


* 


- 


* 


- 


- 


- 


* 


ZOOM 


* 


* 


- 


* 


- 


- 


- 


* 



FR, CR, SR and SIR: fixed, complete, simple continuous and simple integer 
recourse. JC and SC: joint and separate chance constraints, dd, cd, nd: 
discrete, continuous, normal distributions; i: independence. 
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On an On/Off Type Source 
with Long Range Correlations 
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Abstract. In this paper we study the ON/OFF model of telecommunication 
traffic source. The time duration of the state ON of this model has heavy-tailed 
probability distribution with infinity variance. We prove, that Index of 
Dispersion for Counts of traffic generated by such source is unbounded for t 
increasing to infinity. It means, tliat this traffic possesses long-range 
dependency. 

Keywords. Heavy-tailed distribution, long-range dependency. Index of 
Dispersion for Counts 



1 Introduction 

Meaurements on a LAN-network at Bellcore [1] have shown, that LAN-traffic 
has long-range dependency. There have been proposed a few models covering 
this fenomena. Norros [4] has applied Fractional Brownian Motion to model 
LAN arrival process. Veitch [5] proposed model, which could be named 
’Tractional renewal process J. Le Boudec [3] has used five stage semi- 
Markov process for modelling self-similar data traffic. Very interesting and 
simple model was developed by M. Villen [6]. This model is based on Poisson 
arrival of bursts, which duration times are random variables with infinite 
variance. 

All mentioned above models differ from that used up till now. We will try to 
show, in this paper, that traditional ON/OFF model could capture long-range 
dependency of LAN traffic. We only assume, that state ON has heavy-tailed 
distribution with infinity variance. We use here Pareto distribution. We use 
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Pareto distribution, because it is heavy-tailed and it was suggested in [8], that 
such distribution well describe state ON duration time. 



2 Description of the model 

It is considered ON/OFF source, which generates traffic with intensity A.(t) : 



A. (t) = d X(t) (1) 

where d is a peak rate of source, and is a stochastic process defined as 
follows: 




1 if source is active (in state ON) at moment t; 

0 if source is not active (in state OFF) at moment t. 



The example of process x(t) trajectory is presented in Figure 1: 




We assume, that (Tj ) and (ti j ) are sequences of i. i. d. random variables with 
distribution functions; 

F(t) . p{T| < t} 

0(c)=p{lli<t} ,i = l,2 



where: 
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f(0 = 



0 t<a 

1 - , t ) a, a ) 0, 



( 2 ) 



It means, that duration time of state ON has Pareto distribution. We suppose, 
that a € ( 1 , 2 ) , thus the mean value of random variables (t^ ). is : 



61 — a + 

a-1 



a a 



a-1 



(3) 



and their variance is infinite. About (tlj)- we assume, that they have finite 
expected value : 

00 

00 = / G{t) dt ( + 00 
0 

Let Y (/) be amount of traffic arrived in an interval (O,/). Taking into account 
( 1 ) we have: 

y(/) =K(y)dy= d}x(y)dy (4) 

We assume that process is stationary and X(0)= 1. It implies, 

that [7]: 

F.(o=^ic- 

^ 0 

We would like to know, if process possesses long-range dependence. 

Index of Dispersion for Counts (IDC) will be derived forF(f), in order to 
check it. IDC is defined as follows [1]: 





( 5 ) 
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It is known, that: 



EY(t) = e ( 1 dx(y)dyl = d J Ex(y)dy = | p{x(y) = l}dy 



and 



EY(t)= d • PON 



where: Pqn = = l} = — ~ — stationary renewal process. 

01 + 02 

Now, will be derived variance of ) . It is known, that: 



D^Y(t) = d^ D^f J X(y)dyl = d^ • 2| (t - t)Kx ( x)dT 



( 6 ) 



where: 



Kx (x) = E((x(t + x)- E(x(t + x))(x(t)- EX(t)))) (7) 



By assumptions, process X(t)is stationary. It follows that: 

E(x(t + x)) = E(x(t)) = PoN 

thus: 

Kx(t)=EX(t + x)x(t)-p^N 

It remains to derive E x(t + x )x(t) . From definition of x(t) we obtain: 
Ex(t +x)x(t) = P{x(t+x ) = 1, x(t) - 1 } = 



( 8 ) 



( 9 ) 



= p{X(t + X) = llX(t) = !}}• P{X(t) = 1} 



and from [7]: 
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p{x(t + x)= l|x(t)-l}=— -7R(y)dy + jR(T:-v)dH2 (v) 

01 T 0 

where : r(t ) = 1 - f(t ) 

and H 2 (t ) is expected number of process x(t) transitions from state 0 to 1 in 
an interval (O, t) , given that x(o) = 1 . 

We now find approximations for two parts of equation (9) right side. We know 
from assumptions, that R(t) = a“ • t~“ for t ) a . Hence, for t ) a : 



^^l(^) = ~-7R(y)dy--^ Ta“ y “dy = -^^ —-x “^^(10) 

01 X 01 X 01 a-1 

From Smith’ theorem [7] it is known, that: 



X 1 CO 

lim J r(t - v)d Ho (v) = — J R(y)dy 

T-^ooO ' 0 0 

where 6 = 6i + 02. It is clear, that; 

1 OO I X 

— I R(y)dy = lim — | R(y)dy 

e 0 X-+OO0 0 ^ ^ 

Therefore, we can use the following approximations for x ^ : 

jR(^-v)<lH 2 (v)--i)R(y)dy = 



a 

~ 0 



— ^ L~^ + 1 _ + 1 \ 

0 a - 1 V ^ / 



( 11 ) 



From (9) - (1 1) we obtain for x -» °o ; 
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p{x(. + x)=l|x(t)=l}- 

1 -a + 1 a a® 1 / -a + 1 -a + 1 

X + — + a -X 

01 a- 1 0 0 a-1^ 

a“ ^-g + lj" 1 1 ^ ^ a-g 

g - 1 ^01 0 J 0 (g - 1 ) 
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we finally obtain; 



where 



A =2d -A 



1 



2 - a 




(14) 



Now, we can evaluate IDC (t) for / -4 : 




d poN t 



A . + 2 

d PON 



(15) 



We see, that IDC(t) — > + , as t oo because a e (l,2) . Thus process 

Y(t) possesses long-range dependence. It is possible to evaluate the Hurst 
parameter H. 

For the second-order self-similar process Z is [4]: 

Z(p . t)= p D^z(t) (16) 

In our model according to (14) we have 

D^y(P • t ) ~ A"- 1"“'^^ • p~“+3 ~ p““‘*'^ ■ D^Y(t) (17) 

and relation between H and a is following: 



2H=-a+3 



thus 



H = 



-a - 1 - 3 



2 



and a = 3 - 2H. 



(18) 
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3 Model verification 



In order to verify the proposed model some simulation experiments have been 
done. It has been generated traffic according to proposed ON/OFF source 
model. We assume that parameters of considered source have the following 
values: 

• peak rate d=10Mbits/s; 

• time duration of state ON is random variable with Pareto distribution, Hurst 
parameter H=0.8 and this implies, that parameter a=1.4; 

• the minimal time duration of state ON a=55|iis, it implies from minimal 
packet size in Ethernet (72 bytes) and peak rate; 

• time duration of state OFF is a random variable exponentially distributed 
with expected value 02=4Oms. 

Results of simulation are presented in the following Figures. 
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0 100 174224 300 400 500 



Fig.2a. Time unit=100s (Pareto) 
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Figures 2a.-6a. present intensity of traffic from source proposed in this paper for 
five different time scales. Intensite of traffic is measured in number of packets 
(ATM cells) per time unit. Starting with a time unit of 100s, each subsequent 
plot is obtained from previous one by increasing the time resolution by a factor 
10 and randomly choosing next subinterval. This traffic can be compared with 
one generated from common ON/OFF source where state ON and OFF are 
exponentialy distributed with expected values as in previous model. Results for 
the second model are presented in Figures 2b.-6b. 

It is simply to see, that plots 2a.-6a. are „similar” to one another, when plots 
2b.-6b. differs for different time scales. For small time unit traffic is bursty but 
for higher time scales it is too „smooth 

Figures 7a. and 7b. show IDC obtained from simulation and analyticaly. 

It is seen consistency of both plots 
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We proposed some modification of known ON/OFF model of LAN tiaffic 
source. We assume, that time duration of state ON has Pareto distribution with 
infinite variance. Such assumption implies, that traffic generated by source 
possesses long-range dependency. The advantages of presented model are the 
following: 

• simplicity compared with the others models of LAN traffic 
proposed in last a few years; 

• fact, that ON/OFF models are commonly used to study ATM 
networks. 
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Abstract. In this paper a method for constrained minimization of functions 
is outlined. This method is similar to the method developed by Hasofer/Lind 
and Rackwitz/Fiessler; but firstly it can be generalized to problems with 
several constraints and secondly under slight regularity conditions its conver- 
gence can be demonstrated. 

Further it is shown that the sequential quadratic programming schemes, 
which produce an approximate Hessian of the Lagrangian, can be used easily 
for calculating SORM approximations, since the determinant of this Hessian 
divided by the squared length of the gradient of the limit state function is 
the inverse of the square of the SORM correction factor. 

Keywords. Structural reliability, constrained minimization, SORM approx- 
imations, Lagrange multipliers, asymptotic approximations. 

1 Introduction 

In the usual structural reliability formulation, the state of a structure is 
modelled by a random vector X denoting the basic random variables which 
describe the loads, the material properties and the geometry. Let g{x) be 
the limit state function, then the failure domain is given by 

F = {x;g(x)<0}. (1) 

We have now to calculate the probability of failure given by 

P{F)= j f{x)dx. (2) 

g(x)<o 

with f{x) the joint p.d.f. of the random vector X. 

All analytic methods as well as methods which use analytic approximations 
as a starting point, need a numerical algorithm for finding points on the limit 
surface G = {x;g{x) = 0} where a function is minimal with respect to this 
surface. In the case of a standard normal distribution, it is necessary to find 
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the points with minimal distance to the origin, i.e. the «*’s, where 

1**1 = min 1*1. (3) 

In the general case we consider the log-likelihood of the joint p.d.f. defined 
by l{x) = ln(/(®)), i.e. we seek points, where 

^(**) = max /(*). (4) 

In both cases this is equivalent to maximize the joint p.d.f. f{x) of the 
random vector X on the limit state surface. Concepts how to calculate then 
the asymptotic approximations for the failure probabilities are outlined in [2] 
and [3]. 

Liu and Der Kiureghian [6] compared several algorithms without coming to 
a conclusive result which one is the best. This is not surprising; depending on 
the structure of the reliability problem, for different cases different methods 
may be preferable. All algorithms use only information about the function 
and its gradient. 

In sequential quadratic programming methods this information is used to 
approximate the Hessian of the Lagrangian by an updating scheme. Therefore 
here it is necessary to store an n x n matrix and to solve a linear equation 
system with n-\- 1 equations at each step. 

Often used structural reliability is the method proposed by Hasofer/Lind 
and Rackwitz/Fiessler [10], in the following abbreviated HL-RF method. 

The HL-RF method computes the next point Xk+i by linearizing the func- 
tion g{x) at Xk and computing the point on the hyperplane ^l(®) = 0 with 
minimal distance to the origin. This point is then the next point jeat+Ij i e. 

*fc+i = {xj'7g{xk)) - g[xk)) '^gixk) (5) 

If we define the unit vector by 

= |V<,(*fc)|-V5(*fc) (6) 



we can write this as 









XI a 



g(*fc) 

|V5f(**)| 



a 



(k) 



( 7 ) 



To improve the convergence properties, a modification of the HL-RF method 
was proposed by Veneziano et al. ([12]; [4], p. 156). At each iteration step 
the unit vector was replaced by the vector 

+ (1 - 



(8) 
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where the value of 'ip was chosen different for even and odd step numbers to 
avoid periodic jumps. The values were 'ip — OA for odd and 'ip = 0.39 for even 
step numbers. 

Other modifications were proposed in the paper by Hohenbichler et al. [5] 
and in Liu and Der Kiureghian [6]. But for both of them, the original method 
and the modifications, some convergence problems remained. 



2 The linearization method of Pshenichnyj 

In this section we will describe a minimization method developed by Pshenich- 
nyj [9]. This book is a translation of a Russian/Ukrainian book which ap- 
peared in 1983. A paper by the same author [8] outlining the method was 
already mentioned in [1]. There it was also noted that it the HL-RF method 
appears to be a simplified form of this algorithm. Pshenichnyj developed this 
method especially for the minimization of functions under constraints. It can 
be applied for equality and inequality constraints. 

In the following we will consider only the case of one equality constraint 
g{x) - 0, i.e. 



min /(a;) under g{x) — 0. (9) 

The first step is to linearize the functions / and g around the starting point 
ajQ of the search. They are replaced by 

/l(®) = /(aso) + V/(a;o)^(x - * 0 ) (10) 

9 l { x ) = 5f(a;o) + V5(a!o)^(® - * 0 ), (H) 

We have the problem to find a vector d such that 

/i,(a;o + d)= min /^{xo + d). (12) 

gL{Xo+d)-0 



Using now the method of Lagrange multipliers, we obtain the following 
equation system 



V/(a!o) + AV 5 (a;o) = o (13) 

Vg{xo)'^d = -g{xo) (14) 

But this problem will have a solution only if the gradients are parallel. 

The modification proposed by Pshenichnyj is to add a term in the function 
to be minimized to make the equation system solvable and to avoid too large 
steps in the algorithm. The additional term is |dp/2, i.e. the squared norm 
of the step divided by two. So the problem is now to find the minimum of 

f(xo) + Vf(xofd+\d\^2 



(15) 
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under the constraint ^L(aJo + d) = 0. Using again Lagrange multipliers, we 
find 



V/(aJo) +d + AV^(®o) = o (16) 

Vg{xo)'^d = -g{xo) (17) 

This gives 

d = -(V/(a;o) + AV(/(*o)) (18) 

'^g{xofd = -g{xQ) (19) 

Replacing the vector d in the last equation by the right hand side in (18) 
gives for A 

-g{xo) = -V£f(a5o)^(V/(a;o) + AV5 (*o)) (20) 

A = |V^(a;o)rMi?(®o)-V(7(*o)^V/(a;o)) (21) 



So the solution vector is 

d = - [V/(*o) + \^g(xo)\-^ {g{xo) - Vg(xofVf(xo)) Vg(xo)] . (22) 

Now there comes an additional modification. If we calculate the new point 
xo+df we do not know if this point is ’’better” than the starting point. But if 
at a point x* the function f(x) has a constrained minimum under ff(x) = 0, 
the function /(x)-hJVlj^(x)lj the augmented Lagrangian, has an unconstrained 
minimum if TV > 0 is large enough (for details see [9], chap. 1.2.8). So if we 
check the value of the augmented Lagrangian 

Bjv(x)=f(x)-hNlff(x)l (23) 

is less than at the starting point, we see if the new point is ’’better”. If 
the value has not decreased, instead of taking as next point xq -h d, the step 
length is decreased by multiplying repeatedly it by 0.5 until we find a point 
xo + ad with JIjv(xo + ad) < JIjv(xo). In the figure the iteration step length 
is 0.5d to give a decreeise in the augmented Lagrangian function. 

The difference between this method and the HL-RF algorithm is that the 
latter minimizes directly the function \x\^ under the constraint gii^) = 0 
without checking if the computed new point is in some sense ’’better” than 
the last one. This can lead to a non-convergent behavior. The method of 
Pshenichnyj instead ensures by the addition of the term |cip in the function to 
be minimized that the step length does not become too large and further the 
check of the value of the augmented Lagrangian guarantees convergence to- 
wards a stationary point of the Lagrangian under slight regularity conditions 
(see [9], p. 45-9). 

It should be noted that the direction used in this method is the same as 
Liu/Der Kiureghian [6] obtain by introducing a merit function to improve 
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Figure 1 : An iteration step of the linearization algorithm 



the robustness of the HL-RF method. The difference is that as outlined in 
the next paragraph the line search which is done here is to get a decrease in 
the augmented Lagrangian and not in the merit function. 

3 The algorithm 

The proposed method for minimizing a function f{x) under the constraint 
g{x) = 0 can be summarized as follows: 

1. Set k = 1. Choose a starting value N for the augmented Lagrange 
function Hn{x) and a value e. Choose a starting point Xk- 

2. For the point Xk calculate the coefficient Xk by 

Afe = \Vg{xk)\-^ {g[xk) - ^g(xkf'^f{xk)) (24) 

and the new search direction dk by 

dk = -[Vf{xk)-V\k^g{xk)]^ (25) 

3. Set a A: = 1 and then divide by two until the inequality 

f{xk+akdk) + N\g(xk + akdk)\ 

<f{xk) + N\g{xk)\ - eak\dkf (26) 



is satisfied. 
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4. Take as the next point in the iteration 

5. If we have for the factor N of the augmented Lagrangian that 

N<Xk, (28) 

set 

N = 2Xk. (29) 

6. Set k = k 1 and return to step 2. 

If this method is used for finding the beta point, the function to be minimized 
is f{x) = |«p and its gradient is Vf{x) = 2x. 

How to choose N and e? Since N should be larger than the Lagrange 
multiplier /3\\/g{x*)\~^ at the beta point, it should be taken large enough 
at the beginning, for example N = 2 x |®o|/lV^(a;o)|. If it is too small, it 
will be increased automatically in step 5. As value for e should be taken as 
a number between 0.1 and 0.5. In the scheme above no stopping criterion is 
included. Usually the algorithm should be stopped if the steplengths become 
too small. 

Modifications of this algorithm for the case of several limit state functions 
are given in [9], chap. 2. There are given also some modifications similar 
to the quadratic programming schemes outlined in the next paragraph to 
improve the convergence velocity in the final steps of the algorithm, but they 
do not make a reconstruction of the modified Hessian. 

4 Quadratic programming and the SORM- 
factor 

In this paragraph we will consider now a different topic, i.e. how to calcu- 
late the SORM factor in an easy way. In sequential quadratic programming 
methods a point (»*, A*) is sought where 

Vf{x*) + X*Vg{x*) = o 

g{x*) = 0, (30) 

i.e. a stationary point of the Lagrangian. This is done by trying to find 
approximately the Jacobian of these functions, i.e. the matrix 

/ Hj(x) + XHg(x) Vg{x)\ 



L{x,\) = 



0 



(31) 




365 



and then using Newton’s method to calculate the step dk from a point Xk to 
the next point Xk-\-i by 

f \ f \ \ f ^ 

A,+i J - \Xk J^\^k J 

- 

Here Kk = Xk-\-i — Xk is the change in the Lagrange multiplier. To ensure 
convergence the steplength is usually modified by making some line search 
along the direction defined by i.e some point Xk + o^kdk with 0 < < 1 

is taken as Xk^\. 

The characteristic of these methods is that the matrix is not computed di- 
rectly, but reconstructed approximately from the differences of the gradients 
at the various steps of the algorithm. So the numerical calculation of second 
derivatives is avoided. A widely used algorithm of this form was developed 
by Schittkowski [11]. 

If we compare the computation of the new step in these methods with the 
linearization method, we see that the linearization method for one equality 
constraint is obtained from the sequential quadratic programming scheme by 
taking instead of the matrix H j{x) AJH’^(a5) the unity matrix. 

The SORM approximation for the failure probability takes into account the 
additional information of the second derivatives of the limit state function at 
the design point, i.e. the failure probability is approximated by 

n — 1 

P(F) ^ $(-/?) (33) 

1 = 1 

with the the main curvatures of G at the beta point x* . 

One objection against using SORM approximations is that it requires the 
numerical calculation of second derivatives and then an eigenvalue analysis 
for the modified Hessian at the beta point. 

But if a sequential quadratic programming scheme is used to find the beta 
point, we can use this approximate Hessian of the Lagrangian for calculating 
the SORM factor. Writing the SORM factor as a function of the main cur- 
vatures is intuitively appealing, since it gives a clear geometric meaning, but 
it is not well suited for a numerical analysis. This factor nr=/(^ 
can be written as (see [2], [3]) 

i=l 

= det((7„ - P)H(In -P) + P))-^/^ 

' V ' 

=H\x-) 



( 34 ) 
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with 



" ■ „ 

and P the projection matrix onto the gradient vector Vg{x*), i.e. 

P = |V(/(®*)|-2v</(a;*)V(/(®*f . (36) 

By a suitable rotation we can achieve always that the beta point is x* = 
(0, . . . , 0, /?)^. In this case H*{x*) has the form 

/l + \*gn{x*) ... ygi,n-i[x*) \ 

A*52i(®*) ... 

V A*<7„_i,i(x*) ... l + X*gn-l,n-l(x*) ) 

with A* = |V5 (x*)|~^/3. Now the gradient is V 5 '(x*) = |Vflf(x*)|(0, . . . , 0, 1)^ 
and the modified Hessian £(x*,A) at (x*, A*) is 

/ 1 + A*^fu(a;*) ... A5 Ii„(x*) 0 \ 

A*52i(®*) ... X*g2n{x*) 0 

A*flfnl(**) ... 1 + A*flf„„(x*) gn{x*) 

V 0 ... gn{x*) 0 / 

Expanding the determinant of this matrix with respect to the last row and 
column we see that 



det(L(x*,A*)) = -gl{x*)det(H*(x*)) 

= -|V<?(x*)pdet(ff*(x*)). 



So we obtain 






(37) 



(38) 



In this form the SORM factor can be computed directly from the matrix 
which is used in the sequential quadratic programming method. Due to the 
rotational symmetry of the standard normal density, this result is indepen- 
dent of the chosen coordinate system and therefore we get 

^ - y|det(L(x*,A-))r 



(39) 
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5 Numerical examples 

Example 1 

We consider the second example in the paper of Liu and Der Kiureghian [ 6 ] . 
Here a limit state function is given in the form 



6 

g{x) - + 2 (x 2 + xs) + X 4 - 5 (a ?5 + xq) + 0.001 ^sin(100a?,). (40) 

*=i 

The random variables ... are independent and all have a lognormal 
distribution. The variables X\ to X 4 have mean 120 and standard deviation 
12. The variable X 5 has mean 50 and standard deviation 15. Xq has mean 
40 and standard deviation 12. The terms in the sum produce noise in the 
function. 

Liu/Der Kiureghian ([ 6 ]) found as beta point the point with the coordi- 
nates (—.228, —.400, —.400, —.228, 1.75, 1.12) and record as j3 the value 2.3482 
which is not the norm of this vector. The reason might be some rounding 
error. We found as beta point a point (—.167, —.330, —.330, —.167, 1.75, 1.19) 
with P = 2.18. Since even for the limit state function without noise we had 
convergence to this point, we assume that it is the correct beta point. 

The algorithm was stopped if the step length was less than l.E — A. The 
linearization method needed six evaluations of the gradient and additionally 
123 evaluations of the function g. This is better than all recorded values for 
the methods compared in [ 6 ]. 

This result shows that this algorithm is convergent even in the presence 
of noise in the limit state function and that the computation effort in this 
example is less than for other algorithms. Due to the presence of noise in the 
limit state function it is here not useful to calculate a SORM factor. 

Example 2 

A further study covers the example 5.6, p. 97-101 in Madsen et al. [7]. Here a 
plane frame structure is considered. Plastic hinge mechanisms are considered 
for causing the failure of the structure. 

The plastic moment capacities Xi , . . . , X 5 are lognormally distributed ran- 
dom variables with means 134.9 kNm and standard deviations 13.49 kNm. 
The load Xq is also a random variable with a lognormal distribution having 
mean 50 kNm and standard deviation 15 kNm. The load A 7 , also lognor- 
mally distributed, has mean 40 kNm and standard deviation 12 kNm. They 
are all independent of each other. The limit state function is 



g[xi , . . . , a?7) = -h 2aj3 + 2a?4 -h a?5 - /i • a?6 - ft • a?7. 



(41) 
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We take h = 5m. The limit state function can be written directly as a 
function of standard normal random variables , [/y. 

By running the sequential quadratic programming algorithm of Schittkowski 
[11] as described in Liu/Der Kiureghian [6], but with a slight modification 
to ensure numerical stability (i.e. only the projection of the Hessian on the 
tangential space was reconstructed), we find the beta point 

a;* = (-.22, 0, -.43, -.43, -.22, 2.39, 1.45). (42) 

The FORM approximation is 1.97 x 10“^. 

We found as SOR M fact or, by evaluating the modified Hessian at the beta 
point the value l/\/0.546 = 1.35. The value obtained from taking the re- 
construc ted H essian from the sequential quadratic programming algorithm 
was 1 /a/ 0.531 = 1.37. This gives as SORM approximation 2.66 x 10“^ with 
the exact Hessian and 2.7 x 10“^ with the approximate Hessian. The exact 
failure probability for this example is 2.69 x 10“^ (see [7], p. 116). 



6 Summary and conclusions 

In this paper a method for constrained optimization for determining the 
design point in structural reliability problems was outlined. The method is 
similar to the Hasofer/Lind-Rackwitz/Fiessler algorithm, but it can be shown 
that under slight regularity conditions it will converge to a stationary point of 
the Lagrangian. This convergence is achieved, since in each step of the search 
the step length is varied such that the Lagrangian function of the problem 
decreases. 

In principle the most promising approach for calculating the design point 
and the approximation for the failure probability appears to be a hybrid 
method. First to locate approximately the position of the point by using the 
linearization method whose convergence velocity is linear and then to start a 
sequential quadratic programming algorithm with quadratic convergence to 
get the exact position of the point and, thence, also the SORM factor. 
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Abstract. We consider the initial-boundary value problem for the hyper- 
bolic partial differential equations of thermoelasticity theory for non-simple 
materials. The new approach is based on the fact that we consider the 
initial-boundary value problem for these equations with control for tempe- 
rature. We formulate the control for termperature in the terms of maximal 
monotone set. Existence, uniqueness and regularity of the solution to this 
initial-boundary value problems are proved in Sobolev space. In our proof, 
we use the semigroup theory and the method of Hilbert space. 

Keywords: control for temperature, optimal control, non-simple thermoela- 
stic materials, boundary-initial value problem, Hilbert space, Sobolev space, 
semigroup theory, stochastic equations 



1. Introduction 

The heat conduction problem with control for temperature was formulated 
and solved by Duvaut and Lions (cf. [7]). This problem was also studied 
in [3, 9, 21]. The linear thermoelasticity equations (for hyperbolic system), 
with control for termperature was investigated in [11]. We extend our consi- 
derations in order to solve the initial-boundary value problem with control 
for temperature for linear thermoelasticity theory of non-simple materials. 
We consider non-simple materials whose local state is characterized by the 
temperature, its gradient, the time rate of change of temperature, the de- 
formation gradient and its gradient (cf. [6, 14]). Existence and uniqueness 
of the solution for linear thermoelasticity for non-simple materials (without 
control for temperature) were considered in ([14]). Using the method of So- 
bolev spaces and the method of semigroup theory we prove that the solution 
of the boundary-initial value problem with the control for temperature for 
the equation describing non-simple materials exists and is unique. 

In order to formulate the control for temperature (in generalized form) we 
used the theory of maximal monotone operators. 
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Our paper is organized as follows. 

In section 2 the statement of the problem is given. Section 3 is devoted to 
the proof of the theorem of existence and uniquenes of the solution to the 
initial-boundary value problem with control for temperature for the equation 
describing non-simple thermoelastic materials. Finally, in the last section 
some concluding remarks are given. 



2 . Statement of the problem 

Let Q be a domain in with the smooth boundary T = dQ. The time 
variable t takes values from [0, T] C K. 

We consider the thermoelastic, anisotropic medium, characterized by tempe- 
rature, its gradient, the time rate of change of tempearature, the deformation 
gradient and its gradient i.e. so called non-simple materials (cf. [6]) occupying 
domain fi. For a; E and t E [0,T] we denote by u = u{x^t) the displace- 
ment vector field of the medium and by 0 = 0(a?,/) the temperature of the 
medium. Below, we consider boundary-initial value problem for thermoela- 
sticity of non-simple materials with control for temperature. Now we describe 
the meaning of the control for temperature. 

We need (in many applications in technology) the temperature of the medium 
to take the value from the interval [0i(a:), 02 ( 2 ;)] for any t E (0, T). 

How to reach this? 

In order to obtain this aim we must control the voluminal heat source satis- 
fying the role of the temperature controller with intensity g. 

Let the intensity of this heat source belong to the interval [gi , ^ 2 ] (we assume 
that 0 E [gi^gTli 9i = I 5 2) is also equal to the intensity). 

We control the intensity of the additional source as follows: 

If 0(a:, t) E [0i(ic), © 2 ( 2 ?)], then ^ = 0 

2® If d(x) ^ [01 (a?), © 2 ( 2 ^)]) then we lead the heat which is directly propor- 
tional to the difference between the temperature ©(ic;:^) and the interval 
[©i(x), 02(a?)]. So, we have. 



6(x,t) > © 2 (ar) 



g = k 2 {& - © 2 ) if fc 2 (© - © 2 ) ^ 92 
9 = 92 if ^2(©-©2)>5^2 

-9 = ki{e-ei) if ki{e-ei)^9i 
if ^ 1(0 - 0i) < gi 



Q{x,t) < Qi{x) 



-9 = 91 
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We can write these cases as follows 

-9 = m 

where 

91 if 0 ^ 01 + 

ki 

ki(Q — 0i) if 01 + —gi ^ 0 ^ 01 
ki 

0 if 01 ^ 0 ^ 02 

^ 2(0 — 02) if 02 ^ 0 ^ 02 “I" J~92 

^2 

92 if 0 ^ 02 + —g2 

We can describe the control for temperature more generally in term of ma- 
ximal montone operator i.e. 

-9 e m 

where: /? : fi xM — ^ M is multivaleed operator, such that Vx G 0 — >■ /?(x, 0) 
is maximal monotone operator. Below, we give some examples of the operator 

my- 

Example 2.1. Jumping control; 

Let k\ = k2 = + 00 , — oo < gi, Q2 < +oo. In this case the intensity of the 
heat source is bounded operator /?(•) and has the following form: 

' gi if 0 < 01 
[ffi,0] for 0 = 01 
m) = •{ 0 for 01 ^ 0 ^ 02 
[0, Q2] for 0 = 02 
Q2 for 0 > 02 

for given 0 i ,02 (0i < 02 ); ffi, 92 (6 € {91,92)) 





Fig. 2. 1 . 
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Example 2.2. Upper restriction for the temperature. In this case we have 

'0 if 0<02 

/3(0) = ■< [0, +oo) if 0 = 02 
0 if 0 > 02 




Fig. 2. 2. 



Example 2.3. Lower restriction for the temperature 

'0 if 0 < 01 

I3(Q) = < (-00,0] if 0 = 01 
0 if 0 > 01 




Fig. 2. 3. 



Example 2.4. Lower and upper restriction for the temperature 

' 0 if 0 < 01 
(— oo, 0] if 0 = 01 

/?(0) = <0 if0<0^02 

[0, +oo) if 0 = 02 
0 if 0 > 02 



374 




Fig. 2. 4. 



We assume that EJ(/?) = M 

We will be seeking the displacement vector field u and the temperature 0 
which satisfy the equations of thermoelasticity theory for non-simple ma- 
terials [6, 11]. A linear theory of thermoelasticity for non-simple materials 
based upon an entropy production inequality proposed by Green and Laws 
[12] was devised by Ie§an [14]. 

In that theory the local state of non-simple materials is characterised by 
the temperature, its gradient, the time rate of change of temperature, the 
deformation gradient and its gradient. Below, we consider the linear theory 
of thermoelasticity for non-simple materials as it is established in [14]. The 
basic equations in that theory are as follows: 

the equation of motion 



"I" fi — P^i 



( 2 . 1 ) 



the equation of energy 



pToT] = g,-,i + S 



the constitutive equations 

Tij — Aijrs^rs "I" ^ijpqr^pgr "I" "h (© "I" 

f^ijk — Bfsijk^rs "I" Cijkmnr^mnr "h (^ijk(Q "h CT©)) 
pr] = a + dQ + hQ- biQ^i - aijCij - CijkXijk, 
qi = ToibiQ + kije,j), 



( 2 . 2 ) 



(2.3) 



the kinematic relations 



2eij = Uij + Uj^i, 



^ijk — 



(2.4) 




375 



In these relations we used the following notations: p — the constant mass 
density; To — the constant absolute temperature of the body in its reference 
state; u,- — the components of the displacement vector; 0 — the temperature 
variation measured from the constant temperature To; and Xijk — the 
kinematic characteristics of the body; Tij and pijk — the components of the 
hyperstress tensors; rj — the specific entropy; qi — the components of the heat 
fiux vector; fi — the components of the body force per unit volume; 

Bijpqn Cijkmnr) c%jk) ^ij ci^d^h and Oi are characteristic constants of 

the material and they obey the symmetry relations: 



^ijrs — -^rsij — ^ijpqr 

Cijkpqr — Cpqrijk — Cjikpqr^ Cijk 

The entropy inequality implies 



— ^jipqr — Bijqpr^ 

— ^jik) ^ij — (^ji ^ij — kji. 



(2.5) 



{da — h)Q^ -f 26,*00^j -h %0,t0j ^ 0. 

Substituting now (2.3) and (2.4) into relations (2.1) and (2.2) using symme- 
try relation (2.5) and putting instead of equality (=) the relation E in the 
equation (2.2) and adding the term /?(x,0) on the right hand side of this 
relation, we obtain, the following system of coupled equations: 



df y>i — ^jirs'^r,sj “b ^jipqr'^r,pqj ~^^ji{^J 

~ Bmnsji'^m^nsj Csjimnr'^r,mnsj ~~ ^sji{^ ,sj H“ “b fi 

1,2,3 

(2.7) 

G —ddt&-\-2bidt&,i-{-aijdtUij-\-CijkdtUk^ij + kijQ^ij-\-S-\-/3{x,Q) (2.8) 

(without loss of generality we assume that p = 1 and ft = 1, To = 1). 

We adjoin the folowing boundary conditions to the system of field equations 

(2.7), (2.8): 

u,(a:,^) = 0 for (x,/) € T x (0,T) 

e{x, t) = 0 for (x, ^) G r X (0, T) ^2 9) 

=0 for (x,t)eTx{0,T), i = 1,2,3 

where ^-nd nj are the direction cosines of the outward normal 

to the boundary F = 5Q and with the initial conditions: 
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for X 
for X G 



( 2 . 10 ) 



0(x,O) = 0°(x), 
u{x, 0) = 



at0(x,o) = 0i(x) 

dtu{x,0) = u^(x) 



REMARK. Relation (2.8) describes the problem with the control for tempe- 
rature for the non-simple thermoelastic materials. 



The following assumption on the material properties will be used throughout 
this paper: 



a > 0, h=l, /?=!, To = 1 

(da)C^ + > ^o(C^ + ^0 > 0, 

for all arbitrary ( and (i ; 



( 2 . 11 ) 



^ijrs^ij^rs “1“ Xmnr "I" CijkpqrXijhXpqr ^ 

^ d" XijkXijk)i Uq ^ 0? 

for all arbitirary, 



( 2 . 12 ) 



^ij — Xijk — Xjik- (2.13) 

The above assumptions are in agreement with the restrictions imposed in 
[14]. The assumption (2.11) represents a considerable strengthening of the 
consequence (2.6) of the entropy production inequality. The condition (2.12) 
was used in [13] in order to obtain the existence and uniqueness of solution 
for the boundary value problems in some non-simple theorem of elastostatics. 

In the next section we prove the existence and uniqueness of the solution to 
the problem (2.7) (2.8) - (2.9) (2.10). 



3. Existence and uniquenes of the weak solutions 

Let X be Hilbert space: 

X = {w = (u, ue [H J,(fi) n (n)f , 

G [Mo(fi)f , 0 G iP G Ho(l^)} 

Now, we define the following operators; 

Ai{W) = t;* 

J5j(Vy) = Ajirs'^r,sj "b ^jipqr'^r,pqj d" ^V^j)d" 

^mnsji'^m,nsj ^sjimnr'^rymnsj ,sj d“ 
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C(W)) = V' 

ID>(W) = —dtj) + 2bi%jj^i + dijVij + Cijk'Vk,ij + (3-^) 

and the operator: 

A{W) = (7l(W), B{W), C(W), D(W)) (3.3) 



with the domain; 



£)(^) = {WgX:^(W) eX,v = ^ = 0 = V' = 0on r = 5fi} 

Let ^ C (II^(Q) X IHI^(Q) be the operator defined as follows; 

0{e) = {n € (fi) : t; e /?(0(a;)) e.a. x eQ} (3.4) 



/? is also maximal monotone operator. 
Now, we define the operator B as follows: 



B = X-^X 
B = [Bi, S 2 , .®3, 54 ] 



(3.5) 



where: 

BiW= 0 
B2W= 0 

S 3 W = p{e) ( 3 . 6 ) 

B4 W= 0 

L»(B) = [L^(W)]^ X D0) 

From (3.3), (3.6) and [3] it follows that B is maximal monotone operator. 

Introducing the vector of the form W = (u, 5(U, 0, 9<0) we can convert 
initial— boundary value problem (2.1)— (2.6) with the control for temperature 
into evolution equation in Hilbert space X: 

dtW + AW + MW3'¥ for t€(0,r), W{G) = W° (3.7) 



where: F = (0, /, 0, S) and 



IF“ = (u^u^0^0l) (3.8) 

The symbol 6' has the following meaning; 

(xi, • • - ,a; 7 ,a: 8 )G'^i X - • x^tX^s ^ Ai e {a:i} for i = 1, 2, • • •, 6, 8, xy 6 Ai 
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Now, we use the result of the simigroup theorem in order to obtain the 
existence and uniqueness of the solution to the problem (3.7)-(3.8). 

It is easy to see that the operator A has the following properties: 

1) D(>^) is dense in X, 

2) <AW,W>^0 VW^€D(^) (3.9) 

3) R{M-A) = X, VA>0. 

Basing on the properties (3.9) (1-3) we deduce that the operator A is maximal 
monotone operator in X x X (cf. [21]). 

Also, operator B is maximal monotone. 

If T>{A) n IntI>(B) ^ 0, then from Rockafaller-Moreau theorem it follows 
that, the operator K = .4 + B generates a Co semigroup of contraction in 
the Hilbert space X. So, (cf. [3], p. 131, 136) we can prove that there exists a 
unique solution (weak) to the problem (3.7)-(3.8). It means that the following 
theorem is true: 



Theorem 2.1. If W° 6 V{K), F € W^’^(0,T),X), then there exists a 
unique solution of the problem 



dtW + KW 9'F 
H/(0) = W° 

with properties 

H7gW^'~((0,T);X) 



(3.10) 

(3.11) 



(u,dtu,e,dte) ew^’°°((o,T),x) 



* W*'^(/, V)-, ifc e N denotes the space of measurable functions f : I V with 

d’*f/dt" € L^{I, F) for 0 ^ n ^ fc (derivatives in the weak sence); the norm 
in W*’'2(/, V) is given by: ||/||| = jz'l^o fo IM”/(-.01d^"llv'<l^ 

20 V) = {f: I^V, d"/ldr e b“(/, V^) where 7 = (0,T), 0 ^ n ^ 1} 
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4. Concluding remarks 

Remark 4.1. Applying the method of the convex analysis one can investi- 
gate in the similar way the problem with control for the temperature on the 
boundary T = dO, 

Remark 4.2. The problem (3.11) has the solution of the form: 

t 

W(t) = T(t)W° + jT{t- s)F(s)ds 
0 



for 



V^G(0,T) 



(4.1) 



where: T(t) is Co — semigroup of contractions on X generated by the operator 
K. 



Remark 4.3. In order to obtain more regular solution to the problem of 
optimal control of the temperature we can substitute the maximal monotone 
operator /? by smooth function /?o • 



If we take into account the maximal monotone operator y3(0) (cf. section 2) 
we can accept Friedrichs mollifier of the continuous function /?c(©) of the 
form: 



9i 






if 


0 < 01 - £ 






9i 

2e 


(01 -f- e 


-0) 


if 


01 


-e^e 


< 


01 — £ 


0 






if 


01 


+ £ ^ 0 


< 


02 — £ 


il 

2a 


(0 + £ - 


-02) 


if 


02 


- £ ^ 0 




02 -1- £ 
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if 


0 > 02 “h S 







(4.2) 



for fixed, but arbitrary small e > 0. 

Then the initial condition has very important influence on the reqularity 
of the solutions. Under some assumption we get the solution with required 
regularity: 

For example let us denote: 



V{K>‘) = [veV{K'‘-'^), KveV(K’°-^)] 



(4.3) 
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for A? ^ 2, fc € V(K^) is the Hilbert space with the inner product 

k 

iu,v)T,^K.^ = Y^{K^u,K^v) (4.4.) 

i=o 

So, we have (cf. [5]): 

Let E V{W^), k ^ 2, than the initial value problem (3.11) has the 
unique solution W given by formula (4.1) (for F — being ([? — function) 
with properties: 

W €Cf‘-^[0,T];V{K^)) for j = 0,l,-,k (4.5) 

Remark 4.4. We can also extent our results to the stochastic equation 
describing thermoelastic non-simple materials for example, we can consider 
in the right hand side of the equation (3.9) the term which is the weak 
derivative of Wiener process. The intial value is a random variable. Using 
the method of Hillbert space we can obtain the behaviour of the probability 
distribution /it of W(/, •). 
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1. Introduction 

Optimal trajectory planning for robots is a basic tool for improving manu- 
facturing processes. 

A robot is a multi-body mechanical system. The bodies are driven by torques 
and forces generated by the motors of the robot. The relationship between 
the input (torques and forces generated by motors) and output (position of 
the robot in configuration space) can be defined by the dynamic equation [1]: 

n n 

^ + Gi(g,p) = n, l<i<n, (l) 

j = l j,k=l 

where the following notations are used: 

q : vector of configuration coordinates of the robot, 
p : vector of the model parameters, 

Ti : torques and forces generated by motors, 

Jij : elements of the inertia matrix of the robot, 

Dijk : coefficients of the centrifugal and Coriolis forces, 

Gi : gravity forces. 

Introducing a geometric path parameter s, we can represent the trajectory 
q = q{i) as a function of s, q = q{s), while i and s are connected by means of 
the function s = s{t). This yields then 

qi{s) = q'iS, ( 2 ) 



* Supported by DFG-Schwerpunktprogramm 
“Echtzeit-Optimiening grofier Systeme“. 
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qi{s) = q'iS + q^'s, 


(3) 


1 rtf . 




(4) 



where ( )' = — and (') = — mean the derivative with respect to path 
ds ^ di/ 

parameter s and time respectively. With these definitions we can rewrite 
the dynamic equation as follows: 

ai{s,q,q',p)l3' + bi{q,q',q'',p)/3 + a{q,p) = Ti, l<i<n, (5) 



where 





(5.1) 


a,- = - ^ Jij(q,p)q'j, 1 < i < n, 
j=i 


(5.2) 


n n 

Dijk(q,p)qjq'k, 1 < i < n, 

j=i i,^=i 


(5.3) 


Ci = Gi{q,p), 1 < i < n. 


(5.4) 


The problem of trajectory planning for robots is then to find the functions 
/? = /3{s) and q = q{s) such that a given objective function is optimized, the 
available limits for torques and forces r and allowable limits for position q 
and velocity q are not exceeded, and the robot must move from a prespecified 
initial position in its configuration space to a certain given terminal point. 


Mathematically the problem can be described as follows [2,3,4]: 




min / fa{s,q,q',q",l3,P')ds 

0,q Jo 


(6) 


subject to 




/?(0) = /3{Se) - 0, ^{s) > 0, 0 < s < 5e, 


(6.1) 


^(0) = Qo, q{Se) = ge, 


(6.2) 


qmin ^ 9 ^ qmaxj qmin ^ q ^ qmax} 0 ^ ^ ^ ^ej 


(6.3) 


^ "b P) C, (^, p) < '^max^i) 


(6.4) 


1 < i < n, 0 < 5 < 5e, 
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where qmin, qmax, qmin, ^max , Tmin,i and Tmax,i describe the bounds for con- 
figuration coordinates, their derivatives with respec to time t and for torques 
and forces generated by the motors, respectively. 

As an objective function we consider a linear combination of the cr iter ions of 
the time optimal and the energy minimal problem: 






( 7 ) 



If /Ct = 1 and Ke = 0, we get the time optimal problem, else a mixed problem. 



2. Substitute Problems 

Very often the model parameter p is not exactly known. To get a more reliable 
solution of the planning problem, one should utilize the available statistical 
information about the uncertainty. Hence, for the trajectory problem with 
random parameter disturbances, we propose the following substitute problems 

[2,4]: 

2.1. Chance Constrained Substitute Problem 



min 




s,q,q' 



q",!3,P',p)ds 



( 8 ) 



subject to: 



/?(0) = /?(se) = 0, !3{s) > 0, 0 < s < Sg. (8.1) 

^(0) = go, q[se) = ge, (8.2) 

qmin{^) ^ g(^) — Qmax{^) 1 (^*^) 

qmin{^) ^ g (^) \/ /^(^) — Qmax{^) ) ^ ^ ^ 

P ^ "h “h Cj ^ Tmax^i) ^ (^*^) 



1 < 2 < n, 0 < S < Sg* 

Here, we consider the expected value of the objective function, and we demand 
that the stochastic conditions (6.4) are fulfilled separately with given minimum 
probabilities a,- . 

2.2. Cost Constrained Substitute Problem 



min 

/?,g 




g,g',g",/?, 



( 9 ) 
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subject to: 



/3(0) = I3{se) = 0, /3(s) > 0, 0 < S < Se. (9.1) 

g(0) = go, q{se) = qe, (9.2) 

qmin(^) ^ q(^) — qmax(^)) (9.3) 

qmtn(^) ^ q (^) \/ /^(^) ^ qTnax{^)i 9 ^ 5 ^ , (9.4) 

£pr.(a,/?' + 6i/? + i)<Si, (9.5) 



1 < 2 < n, 0 < 5 < Sg 

where Ti are given cost functions for the evaluation of the violation of the 
restrictions (6.4). In this substitute problem we consider the expected value 
of the objective function and demand that given upper bounds S{ for the 
expected values of the cost functions should not be exceeded. 

3. Numerical Methods for the Substitute Problem 
3.1. Calculation of the Probabilities 

To solve the chance constrained problem (8), (8. 1-5) we have to compute the 
probabilities (8.5) at first. Since many model parameters appear linearly in 
dynamic equations of robots and lie in some given intervals, we consider the 
following model: ^ 






( 10 ) 



where a*, i=l,2,...n, are uniformly distributed in and independent on 

each other. For the random variable y we have therefore 



P{y < 7 ]} = 



n!nr=i(A,-|^i|) 




(v - %■)+ (11) 



with Aj = di — Ci and /„(a) = ^ + ... + 

where j = 1,2, ...2^=:N, are the vertex points of [ci, di] x [c 2 , ^ 2 ] x ••• x 
[cn,dn] and yj = ^ ■ a^^'> + ^o, J = 1, 2, ...N. 



For 



2/ = oi + 2 02 + 3 03 



( 12 ) 
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where a,, i = 1,2,3, are independent and uniformly distributed in [0, 1], we 
get 



P{y<T)} = I 



r 

36 

1 7? 

36 12 12 

1 5 p rf 


T 


for 0 < p < 1 
for 1 < p < 2 
for 2 < p < 4 


4 12 


4 


36’ 




55 117/ 


?/2 


for 4 < p < 5 


36 


12 


12’ 


-5 + 37/-^ 


36’ 


for 5 < p < 6 



(13) 



3.2. Calculation of the Expected Cost Functions 

The computation of the expected values of objective function (8), (9) and cost 
functions (9.5) can be very difficult, because the objective function and the 
cost functions are very complicated functions with respect to model parameter 
p in general. If the derivatives of the functions with respect to p up to a 
certain order exist, the expected values of these functions can be calculated 
approximatively by means of the Taylor expansion of the functions at the 
expectation p of p [4]. In this case we need only some central moments but 
not the whole distribution of the stochastic variable p. 

For a function u = u(p, «) having derivatives with respect to the random 
vector p up to the (K+l)-th order, by means of Taylor expansion of u at the 
expectation p of p we get 



u 



K 



k\ ^ 



+ 






{K + 1 )! 



K 



1 a*u, 






(14) 



where p is a point between p and p. Let pk be the system of k-th central 
moments of p. Given the moments of the stochastic variable p up to K-th 
order, Epu{pj s) can be calculated approximatively by 



K 



Epu{p, s) « u{p, s) + ^ «) • P* • 



k = l 



fc! dp^ 



(15) 
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3.3. Discritization 

The solution of the substitute problems are the functions q = q{s) and (3 = 
P{s). Describing the functions qi = qi{s) and /? = /?(s) as linear combinations 
of known functions, Bj («)i ^ <3 < and I < j < Jo, 

j 

9<(«) = \<i<n (16) 

1=1 

(17) 

1=1 

and putting these into the equations (2)-(2.3) or (3)-(3.3), we get then a 
standard parameter optimization problem. This problem can then be solved 
by means of numerical optimization methods. In our situation we have made 
use of SQP-type algorithms. 

The solution of the problem depends, of course, on the choice of the basis 
5j(s), 1 < j < «/, and Bj{s), I < j < Jo- Because of the well known excellent 
properties, we choose cubic and quadratic B-Splines Bj{s) and Bj{s) 




The functions Bj{s) and Bj{s) have the following properties 

1. 5i(0) = l, 5j(0)=0,i = 2,...J 
5?(0) = l,5^^(0) = 0,i = 2,...Jo 

2. Bj{Se) = l, Bj{Se) = 0J 

B%{Se) = 1, B]{Se) = OJ = I, ..Jo- I 
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3. Bj >0,j = 1, ...J and Bj >0,j = 1, ...Jo 

4. for every s G (0, Se) there are at most 4 non vanishing Bj 
and 3 nonvanishing Bj. 

According to these properties, condition (8.2) or (9.2) will be fulfilled automa- 
tically by taking qoj resp., as the first, last coefficient in the representation 
(16) of q{s). Choosing the first and last coefficients of (3 as zero and the other 
nonnegative guarantees condition (8.1) and (9.1). The local Property 4 is very 
useful for the effective computation of the Jacobian matrix. 

4. Applications on Manutec r3 

To demonstrate our idea, we apply the method described above to solve the 
optimal trajectory planning problem for robot Manutec r3. The robot Manu- 
tec has in fact six rotary joints; the first three are mainly responsible for the 
position of the end effector and the last three for the orientation of the hand. 
To simplify our problem, the last three joints are fixed, so we have only three 
degrees of freedom. The dynamic equations and the model parameters can 
be found in [5]. We suppose that the payload mi is a random variable being 
uniformly distributed on the interval [0, 15]. 

4.1. Solution without Position and Velocity Restrictions 

The time optimal problem is considered at first and the restrictions for the con- 
figuration coordinates and their derivatives are neglected. The initial position 
and the end position are chosen as qo = (0, —1.5, 0) and qe = (1, —1.95, 1), 
and the robot should move the payload from the initial position to the end 
position as fast as possible. 

This problem is solved for the reliabilities a,- = 0.99, = 0.75, and we solve 

this problem also deterministically by replacing mi by its expected value. 
The total run time are 0.5429 seconds for = 0.99, 0.5191 sec. for a,- = 0.75 
and 0.4925 sec. for the deterministic case, respectively. Obviously, the run 
time increases, if the stochastic uncertainty is considered. Moreover, we get a 
longer run time, if the constraints are required to hold with greater reliability 
a. 

In the following figures we describe the configuration coordinates q 2 and qs 
as functions of qi and compare the results for three cases. It is shown that 
one get smaller values of q2 and qs in case of stochastic uncertainty. And 
for higher reliability the functions q 2 and qs are decreased. Since q 2 is the 
angle between the upper arm and the vertical axis and qs is the angle between 
upper and lower arms, in this situation a smaller value of q2 means a closer 
position of the elbow to the base of the robot and a larger value of qs leads 
to a higher position of the payload. This means that in order to guarantee a 




389 



higher reliability, the robot tries to pull the elbow to itself and to prevent a 
too high position of the payload. 




for ai = 0.99 
for a, = 0.75 
for deterministic 
case 



Fig. 2. Trajectories in configuration space without position restrictions 




for O'* = 0.99 
for a, = 0.75 
for deterministic 
case 



Fig. 3. Trajectories in configimation space without position restrictions 

4.2. Solution with Position and Velocity Restrictions 

Now let us consider the situation with position and velocity constraints. The 
bounds are also the same as given in [5]. The total perform time are 0.5715 
seconds for a,- = 0.99, 0.5501 sec. for a,- = 0.75 and 0.5298 sec. for the 
deterministic case. 




<12 
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for ai = 0.99 
for ai = 0.75 
for deterministic 
C2ise 



Fig. 4. Trajectories in configuration space imder position restrictions 




for a, = 0.99 
for a, = 0.75 
for deterministic 
case 



Fig. 5. Trajectories in configuration space rmder position restrictions 

In the same way we show the configuration coordinates and qs as functions 
of qi in Figure 4 and Figure 5. Due to the restrictions on the position and 
velocity, the feasible domain for the optimal solution becomes smaller, hence, 
the optimal trajectories for different cases are closer to each other. But one 
can still see, for a higher reliability the robot tries to pull the elbow to itself 
and to prevent the payload from being held too high. 

4.3. Planning Problem with a Mixed Objective Function 

In next examples we study the influence of the weight factors in the objective 
function on the results of the planning problem and the computing time. 
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For the same initial and end positions as given above we solve the planning 
problem with Kt — I and different values of Kg- The results are shown in the 
following tables: 





Ke = 0 


Ke - 0.001 


Ke = 0.0001 


tc 


920sec 


145sec 


374sec 


if 


0.571528 


0.578460 


0.571605 


e/ 


38.72099 


31.33955 


37.81237 




0.571528 


0.609800 


0.575386 



Table 4.1. Stochastic case with ai = 0.99 





O 

II 


Ke = 0.001 


Ke = 0.0001 


tc 


1358sec 


120sec 


397sec 


tf 


0.529847 


0.541252 


0.531638 


«/ 


48.61951 


32.19380 


45.90686 




0.529847 


0.573445 


0.536229 



Table 4.2. Deterministic case 



where tc means the computing time, and tf,ef and zj are the total per- 
form time, the energy consumption and the value of the objective function, 
respectively. 

These results show that by an appropriate choice of /Cg the computing time 
may be reduced tremendously. To study this phenomenon, we perform the 
computations for different tasks (different initial and end positions) and get 
the following results, where Table 4.3 and Table 4.4 present the results for 
go = (0,— 1.5,0), Qe = (1,— 1.5,0) and Table 4.5 and Table 4.6 for go = 
(0.228, 0.31, 1.75) and gg = (-0.147, 0.73, -1.82): 





Ke = 0 


Ke = 0.001 


Ke = 0.01 


tc 


430sec 


98sec 


52sec 


</ 


0.607482 


0.612947 


0.644437 


e/ 


43.95982 


33.64664 


24.20151 


2/ 


0.607482 


0.646594 


0.886452 



Table 4.3. Stochastic case with a,- = 0.99 





o 

II 


Ke = 0.001 


Ke — 0.01 


tc 


432sec 


69sec 


66sec 


if 


0.559192 


0.571750 


0.611204 


e/ 


52.31024 


31.35354 


25.06793 




0.559192 


0.603104 


0.861883 



Table 4.4. Deterministic case 
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o 

II 

<u 


«e = 0.001 


Ke — 0.01 


tc 


1075sec 


497sec 


143sec 


if 


1.024160 


1.033095 


1.101216 


e/ 


57.78957 


45.49765 


23.56912 




1.024160 


1.078592 


1.336907 



Table 4.5. Stochastic case with a,- = 0.99 





/Ce = 0 


Ke = 0.001 


Ke = 0.01 


tc 


925sec 


230sec 


163sec 




0.873024 


0.884234 


0.991281 


e/ 


72.14502 


51.29047 


23.95391 




0.873024 


0.935525 


1.230821 



Table 4.6. Deterministic case 



From these results we can see that by means of an appropriate choice of Ke , the 
computing time is cut down, while the performing time increases only slightly. 
The influence of the weight factors on the optimal trajectories is shown in the 
following: 




0.0 

0.0001 

0.001 



Fig. 6. Solutions in the q 2 and qi space for different weight factors 
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for Ke = 0.0 
for Ke = 0.0001 
for Ke = 0.001 



Fig. 7. Solutions in qz and q\ space for different weight factors 

These figures show that the robot try to prevent to move too violently, if the 
energy is also considered. 
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Abstract. In structural optimization one often has the situation that some 
of the parameters like material data or load factors are not exactly known, 
but they have a well-known statistical behaviour. There exists a lot of pos- 
sibilities to reformulate the minimization problem such that the stochastic 
variation of the parameters can be taken into account already in the planning 
phase. One way to solve the resulting so-called substituting problems is the 
response surface method (RSM) with an adaptive matrix step size control. 
To get a fast and stable implementation of this algorithm, the setting of 
some control parameters is very critical. The topic of this paper is to present 
RSM as a technique capable to solve stochastic optimization problems, but 
to show also the difficulties in the selection of the control parameters in a 
proper implementation of the procedure. 

Keywords, stochastic optimization, response surface method, stochastic 
approximation, matrix stepsize control 



1 Basic problem 



An often used formulation for a stochastic optimization problem is 

min Ef{x,a{uj)) {=: F{x)), (1) 

where the design variables x E IR^ lie in a relatively simple region D and 
a E IR^ is a vector of independently distributed random variables. There 
exists an abstract probability space and a corresponding probability mea- 
sure P such that the expectation S in (1) is well defined and finite. Examples 
for regions D, defined by simple constraints, are: 

• box constraints, i.e., D = {x E IR^ | ^ ^ or 

• linear ’’deterministic” restrictions, which means that no expensive 
evaluation of the expectation is needed within the constraints, i.e.. 




2 Solution method 



The presented algorithm in principal is a stochastic approximation method, 
which uses estimations of first and second order derivatives of the object- 
ive function F. Deterministic constraints are taken into account by using a 
projection operator, such that the main rule of the algorithm is 

J.n-f-1 “ ~~ Pn*Sn)j ^ ~ Ij 2, . . . , (2) 

where £ D is chosen arbitrarily. In (2), pn > 0 denotes the step size and 
G the search direction. 

To get feasible iteration points, in every iteration step, a simple optimization 
problem has to be solved, namely 

:= argmin||y- x||^. (3) 

■” — 

The projection is a relatively inexpensive operation because of the assumed 
simple structure of D. In the following we consider appropriate selections of 
the step size />„ and the search direction 5^^ . 



2.1 Gradient approximation by using RSM 



The basic idea of stochastic approximation methods is that the whole iteration 
process is based on estimates of functions and derivatives. Thus, a gradient 
approximation method on the basis of only a few function evaluations of / is 
needed. One method of that type is the well-known response surface method 
(RSM). 



Following the Taylor expansion of F in J , 

= E (4) 

k=0 



where 



and 



1 

FTI)! 



d'^F{x_ + t-h) 
dt^ 



t=o 






F{x_ + 1 (x — S)) 



with S £ (0, 1), 



(5) 

( 6 ) 



a symmetrical polynomial model for F is used to approximate the behaviour 
of F near x: 
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Then, a suitable gradient estimation is just the gradient of the model function 
Fi^ in X : 

VF:= VF(S). (8) 

To estimate values for the unknown model parameters 6,^...,-^, unbiased es- 
timations of F are taken at p design points where p > =: r^, which 

is the number of involved parameters up to the i/th. degree. An important 
example is given by 



^ /=1 

where are independent realizations of the stochastic parameter vector. 

According to those estimations, the model function F^^ is fitted by the least 
square estimation method: 



p 

min (10) 

7^1 

It is possible to write the resulting gradient approximation as the product of a 
certain matrix Li, and the vector of function estimations u = .., 

i. e. 

VF = L^u. (11) 

The involved matrix Ljy contains some combinations of the difference vectors 
:z=z 2 , 2 = 1, . . . ,p. For an explicit formula, cf. [5]. 



2.2 Error analysis 

Under some standard assumptions on the function estimations u defined in 
(9) , the mean approximation error for the gradient can be written as 

- S (VF - VF(x)) = (12) 

where R^, := ■ ■ ■ , is the vector of remainders in 

the Taylor expansion (6). 

A measure for the stochastic approximation error is the trace of the covari- 
ance matrix of the difference between gradient and gradient approximation: 

estoch tr [Cov (VF - VF(S)) ] (13) 

If 

(rf := S{f{x^'\a{u)) - F{x^''>)f, i = l,...,p, (14) 
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we have that 



^stoch 



= ti[Lu ■ diag{-^) ■ L^]. 



Hence for the deterministic error we find 



det\\2 






(15) 

(16) 



< \\LuLl\ 



1 



((i.+ l)!)2 



sup lK+^F(x)|p 



x£D 



i=l 



with D = {x+ 6 {1, . . . ,p},t € [0, 1]} n D, and an upper bound for 

the stochastic error is given by 

\ < WL^^LIW ■ max g}\ (17) 

note that both error estimates contain the factor ||Li^L^||. 

Now the main goal of an efficient algorithm is to decrease these two bounds, 
(16), (17), simultaneously, step by step. This task is related to the research 
field called experimental design [1]. 



2.3 Experimental design 

The accuracy of the gradient approximation mainly depends on the position 
of the design points i = 1, . . . ,p with respect to First we assume that 
the distance of the design points to S is fixed. Then there are, dependent 
on the degree v of the model function, lower bounds for the number p of 
design points. According to this number there exist a few possibilities to 
choose ’’good” design directions 

If we choose a linear approximation, i. e. = 1, then a regular simplex 
(Fig. 1) is the best solution in a certain sense, but (fractional) factorial de- 
signs (Fig. 2) are good selections, too. Especially, if we change during the 
iteration process to a quadratic approach, then we can use some results from 
the linear approximation, for example in a central composite design (Fig. 4). 

Usually p is greater than but the additional design points could be used, 
for example, to decide whether a linear or quadratic approach is suitable. 
Another possible design is the so called Koshal design (Fig. 3). Here the 
number of design points equals exactly the lower bound, but the design neither 
is orthogonal nor rotational, which are two important features of a suitable 
design. 
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Figure 3: Koshal design 



Figure 4: Central composite design 



2.4 Influence of the distance — 2| 



Suppose that the design vector — x is replaced by d * 

/i > 0, such that the new design points are 


\= n-S'\ 


x^*^ = + (x^*^ — J + (1 — //)S . 


(18) 


Then it is easy to prove that 




Lu = -Lu 


(19) 


and therefore 

\\L,Ll\\ = ^\\L.Ll\\, 

P 


(20) 
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where results from by \ i = 1, . . . , p. 

According to the new design points there is also a change in the bound for 
the remainder term in the Taylor expansion, cf. (6) and (16), 

^ • TTITTTW E (21) 

xe^iD i=i 

Consequently, (20), (21), resp., yield a new bound for the deterministic error: 



'^det\ 



^ < pi^‘'-\\L,Ll\\- sup^\\V'+^F{x)f 

£G/i-D 






^||rf(0|i2(.^+i)_ (22) 



There is no principal rule, how the factor cr] (see (14)) is influenced by redu- 
cing or enlarging the distance of relative to x. Thus, the new bound for 
the stochastic approximation error is given by 



y'stoch 



< 4 



^ . max (Ji , 
A 1=1, ...,p 



(23) 



where af is an upper bound of the variance erf under consideration. Regarding 
the inequalities (22) and (23), there is a discrepancy between the behaviour 
of the deterministic and stochastic approximation error, resp., with respect to 
a reduction of the factor /i < 1; while the deterministic error is lowered, the 
stochastic error is enlarged. A factor p > 1 would cause the contrary. 



2.5 Hessian approximation 

An essential feature of the present algorithm is, that it makes use of second 
order information, too. Hence, it is necessary to approximate also the Hessian 
of the objective function F, again based on only a few function estimations. 

Starting with a positive definite matrix Fq, the matrix « V^F(^„) is 
updated by 

;= if„_i 

Hn := (24) 

using the stochastic central differential quotient 

2c„ 



( 25 ) 
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where 

:= W(2^), i = l,...,2/„, (26) 

are approximations of the gradient of F in special points, namely 

:= i= (27) 

where 

• /(n,i ),2 = n = 1,2, . . . runs through {1, . . .,r} cyclically, 

• e_j^ E IR^, Ar = 1, . . . , r, denotes the k-th unit vector, 

• and Cn is defined by c„ — with certain cq > 0 and /i. 




Figure 5: Design points for Hessian approximation 

According to (8), we get the approximations of the gradient using the 
design points, cf. (9), 

i = l,...,2/„, i = (28) 

with fixed design vectors and a factor := 

n ^ 

In Fig. 5 the situation is shown for r = 2, /„ = 1 and p = 3. Thus, using 
a regular simplex design in every iteration step, the function / has to be 
evaluated 6 • A"n “times. 

To prove convergence, the approximation of the expectation in (9) has to 
become more accurate, if the iteration number grows. Thus, the number Kn 
of function evaluations at each design point is chosen in the following 

way 

Kn := \Ko ■ n^], 
with a suitable initial value Kq. 



( 29 ) 
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2.6 Implementation of the algorithm 

During the development of a suitable stochastic approximation method which 
makes use of second order information, it became clear, that the quality of a 
Hessian approximation isn’t very high in the starting phase of the algorithm. 
Thus, a partition into several phases is necessary, in order to get a robust 
method. 



2.6.1 Phase 1 

There is only little information about the objective function during the first 
few iterations. Hence, a simple gradient step with a plain step size rule is 
sufficient: It holds 

:=VFn (gradient approximation, acc. to Section 2.1) 

pn := — with a suitable initial step size pQ. 
n 

We change to the second phase, if either a fixed number ni of iterations is 
achieved or the relative distance of two successive iteration points is smaller 
than a given value ei. 



2.6.2 Phase 2 

After some iterations within the first phase, it is more probable, that the itera- 
tion points lie already in a certain neighbourhood of the optimum. Hence, one 
can try adaptive step size rules. At that stage of the algorithm, the Hessian 
is approximated for the first time. Therefore the gradient approximations in 
i = 1, . . . , 2/„, cf. (26), are sufficient for a gradient approximation in 
We choose 

1 2ln 

S'- — 

^ (31) 

Pn •— Pn — \Qn^Pn-l’)^ri’)^n)^~^ Vn- 

Cn 

In the step size rule we have p„_i := min{r„,p„_i}, e.g. with r„ = 
and an arbitrary q G (0, 1). Furthermore, a convergence factor Q„ is selected 
according to 



Q{;S,T):[0,q-] 



l + r<5-r2F’ 



(32) 
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here, second order information is used in form of estimations of the eigenvalues 
of the approximation Hn of the Hessian, namely Sn « 2Amin(^n)j 






2 



tv[Hn^y 

<^0, 



if det(/f„) > 0 , tr[if„] > 0 , and S„ € 
otherwise. 



(33) 



and r„ » A^ax(^n). e-g- 

r _ / (tr[i^n])^ ifr„<r 

” * \ To, otherwise. 



(34) 



A further part of the step size rule is an estimate of the stochastic error 



f 0 , ifCn <0 
Cn j ifCn ^ [OjCw] 

( Cui ifCn > Ctij 



with 




and finally we have a control parameter r]n := exp(K„/n^), where /c„ is given 
e. g. by 

«n :=(^n-^n-l.^n-l-^n-2)* (37) 

Hence, a zigzag-course leads to small values of T)n and straight courses yield 
larger step sizes. We change to the third phase, if both 



^change number of changes from Phase 3 back to Phase 2 (38) 

doesn’t exceed a fixed number and the relative difference between 

two successive Hessian approximations is less than a given value 62 , because 
then we can suppose that the approximation is sufficiently stable to use Hn 
directly in the search direction. 



2.6.3 Phase 3 



As already mentioned, now the search direction is directly based on 
such that we get a kind of quasi-newton search directions. In this case the 
convergence proof requires that 



Hence, we get 



lim npn = 1 , a.s. 

n-yoo 



2 /„ 



” 2 Z„ 



Pn 



Pn-l 

1 + pn- 



i=l 



(39) 
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For the sake of robustness we use the approximation Hn of the Hessian only 
if it fulfills a certain regularity condition, i. e. 



\ Hq, otherwise 



(40) 



where e.g. 

:= {H e 1 det(i7) 0 and (41) 



with a prescribed number oq. 

If the optimum of problem (1) lies on the boundary of the feasible region D, 
then there exists a convergence proof only if the search direction is a gradient 
approximation. Thus, we change back to the second phase, if E dD. 



3 Conditions for convergence and parameter se- 
lection 

There are a lot of conditions, which have to be fulfilled for guaranteeing 
convergence of the algorithm. Some of them are standard in optimization 
theory, some of them are problem dependent, as the proper selection of lower 
and upper bounds for the estimation of the eigenvalues of the Hessian, cf. (33). 
A complete list of convergence conditions is given in [5]. Here, only the most 
important are quoted: 

• lim ^/n = 0 with a problem dependent constant JI > 0, 

n->co 

• lim ^/n fj.^ = 0 for the convergence of the deteministic error and 

n— voo 

• — < C < 00 . to bound the stochastic error. 

- 

These conditions prescribe a special selection of Cm fin , Kn > resp. A suitable 
choice is given by 



11 1 

with j 

fio 

• f^n’-= 

n 




• Kn := \Ko • 




404 



which contain the tuning parameters cq, //q and A"o- An important result from 
the research in this field is, that the iteration course is greatly infiuenced by 
the choice of those parameters. In the next sections some hints for the proper 
selection of the tuning parameters are given. 



3.1 Initial step size po 

From the two contourplots in Fig. 6, Fig. 7, resp., the consequences of an 
inappropriate selection of the initial step size pQ in Phase 1 can be seen. To 




Figure 6: Initial step size /?o too 
small 




large 



get a proper choice, a first condition has to be fulfilled, which is not very 
restrictive. Namely, the bounds for the box constraints ^ and have to be 
selected such that the distance between the optimum and the initial point 
is of the same magnitude as the distance between ^ and . This supposition 
is justified by the fact that the initial point of the stochastic optimization 
problem often results from a previous deterministic optimization run. Hence, 
we define 

iisfr 

with pQ « 0.2 and defined in (30). Then, it holds 

Ife = 1(^0 < po ■ (£„-£/)<, 

hence, that we may expect that ^2 doesn’t leave the box. 



( 43 ) 
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A similar idea could be found in [2] , but there one has no distinction between 
the different components, which, however, allows the handling of rectangles 
with great width-to-height relations. 



3.2 The parameter fio 

A satisfactory iteration run requires good gradient approximations above all. 
Hence, a very important parameter is the length of the design vector = 

l,...,p, cf. Sect. 2.4, used for the RSM-estimation. An obvious aspect is 
that fiQ has to be chosen relatively to the distance between ^ and : 

^0 '= Po Ikti - ^11 with ^0 « 0.05. (44) 

In contrast to deterministic algorithms, the relative distance po within stochastic 
approximation methods has to lie between about 0.01 and 0.5 and not around 
values like 10“^. 

Another important result is obtained from Fig. 8, which shows the relative 
error of the first gradient approximation as a function of Jio for different 



Relative error of the first gradient approximation 




/ \ 

K 

^ 1 
--10 
— 100 
-1000 

\ / 



Po*10 



Figure 8: Influence of Jio on the approximation error for different 
values of K 



numbers K of realizations of the random variables. The difference between 
the given curves is large on the left hand side and small on the right hand side. 
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This reflects the portion of the deterministic, cf. (12), and the stochastic error, 
cf. (13), in the total approximation error. Greater values of reduce the 
stochastic error, as mentioned in (23), such that on the right hand side mainly 
the deterministic error is figured, which is independent of K. Lower values of 
flo yield a lower deterministic error, cf. (22), and enlarge the stochastic error, 
which causes the differences of the shown curves with respect to K on the left 
side of Fig. 8. 

Fig. 9 represents the influence of the variance of the involved stochastic 
parameters on the relative approximation error. The main result of our in- 
vestigation is that a greater variational coefficient induces a lower determin- 



Relative error of the first gradient approximation 




/ ^ \ 

Variance 

-^ 1 % 

-+- 10 % 

— 20 % 

• • - 50 % 

- - 75 % 

\ / 



Mo*10 



Figure 9: Influence of //q on the approximation error according to 
different variational coefficients of the stochastic parameters 



istic error, which is shown on the right hand side of the picture. This effect 
surely depends on the special basic problem, which contains a probabilistic 
objective function. 



3.3 The parameter cq 

After the proper determination of the increment for the gradient approxima- 
tion, the position of the design points used for the Hessian approximation is 
investigated. Here, the same argument holds concerning a selection of c„ in 
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(27) relative to the distance between ^ and 

Co := Co \\x^ - ^\\ with Co « 0.1. (45) 

A principal rule for the choice of cq is that cq should be greater than Jio. Before 
starting the investigation of the proper selection of cq, a measure for the quality 
of the parameter has to be defined. One possibility is to do one step with a 
pure quasi-newton-like search direction and then to consider the resulting 
deviation to the optimum. This was done in Fig. 10, where the influence of 
the number of realizations K on the selection of cq is shown. As seen in Fig. 8, 
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Figure 10: Influence of cq on the relative distance of the next 
iteration point to the optimum according to diflFerent values of K 



we again observe smaller differences on the right hand side of the figure, which 
represents the deterministic approximation error, and shaky behaviour on the 
left hand side, caused by greater stochastic errors. Comparing the two figures. 
Fig. 8 and Fig. 10, we observe a greater sensitivity towards the selection of 
Co than towards po- 



3.4 Comparison of RSM with other methods 

Finally, we compare our algorithm with other well-known methods. The un- 
derlying problem is bcised on a fibre reinforced structure. The deterministic 
task was to minimize the total thickness under some failure criteria. The 
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substituting problem, which considers the stochastic character of some para- 
meters, is of the type (1), where the objective / is a weighted sum of the 
original objective function and the loss due to a violation of the failure re- 
strictions. The penalty functions taken into account lead to formulations, 
which contain a probability function; hence a great number of structural ana- 
lyses is required to approximate the objective function F of the substituting 
problem. 
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Figure 11: Comparison of different methods for solving stochastic 
structural optimization problems 



In Fig. 11 the relative distance of the last iteration point to the optimum is 
plotted against the maximum allowed number of evaluations of the function /. 
Three different methods are compared. We can conclude, that none of these 
algorithms is really better than anyone of the others. The advantage of RSM, 
however, is that the accuracy of the appproximation becomes better during 
the iteration process, while the other methods, approximating the objective 
function with a Taylor polynom, produce always produce an unknown break- 
off error. 
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4 Summary 



A new stochastic approximation method is presented, which makes use of 
the response surface technique to estimate the gradient of the objective func- 
tion. After a short description of an appropriate selection of so-called design 
points, search directions and step size rules of the algorithm are established, 
which are based on second order information. The necessary partition of 
robust algorithms into several phases is explained, and conditions for phase 
changes are specified. A detailed error analysis and the resulting conditions 
of convergence lead to explicit formulas for some of the control parameters. 
Those parameters themselves are based on some tuning parameters, which 
greatly influence the convergence rate. This is figured out in some diagrams. 
An appropriate adaptive choice for those parameters is presented. Last but 
not least, the new implementation is compared to other methods for solving 
stochastic structural optimization problems. This comparison shows no great 
differences, what proves, that RSM is an adequate method for solving prob- 
lems of this type, provided that the involved parameters are selected suitably. 
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Abstract. Response surface methods are used to construct explicit approximations 
of objective and constraint functions in optimization problems. Serving as interface 
between analysis code and optimization algorithm, the function approximations 
provide a fast but approximate evaluation of response quantities in the optimization 
process. Application to the quiet design of an engine air intake system is discussed. 
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1 Introduction 

In structural optimization, often an approximation concept is introduced as interface 
between structural analysis code and optimization algorithm. Then approximation 
functions are built of all objective and constraint functions, based upon function 
values and, if available, sensitivities which follow from structural analysis 
calculations. In this way, the original optimization problem is replaced by an 
explicitly known approximate optimization problem. 

Local, global and mid-range approximation concepts can be distinguished. Local 
function approximations, which are most commonly applied, are based upon the 
function values and sensitivities in a single point of the design space. The 
approximate objective function and constraints then define an optimization 
subproblem. Since local approximations are only valid in a limited area of the 
design space, a new cycle of approximation and optimization can be started at the 
optimum of the subproblem. This procedure is repeated until an acceptable 
optimum is achieved, resulting in the popular process which is indicated as 
sequential approximate optimization. For some optimum design applications, it is 
profitable to build global or mid-range approximations, which are based on 
structural analysis results in multiple design points. Global approximations are 
valid in the whole design space, whereas mid-range approximations are valid in a 
substantially smaller part of the design space, but this part is expected to be larger 
than in the case of local approximations. In the optimization process global and 
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mid-range approximations are used in a different way. After the global approximate 
model has been built, it is passed once to the optimization module to try and find 
an optimum design. Mid-range approximations are used in a sequential 
approximate optimization process, in a similar way as described above for local 
approximations. This paper will be restricted to global and mid-range 
approximation models. 

For building the function approximation models several methods are available. 
In chapter 2 the so-called response-surface method will be explained. The 
application of this method to construct global and mid-range approximations will 
be illustrated with an example of the optimization of an engine air intake system. 



2 Function approximation concepts 

2.1 Global function approximation concept 

Response-surface modelling is a powerful tool to build global approximate models. 
These strategies were originally developed for the model fitting of physical 
experiments [1] but can be applied successfully in multidisciplinary optimization. 

Construction of response-surface models is an iterative process. One starts with 
postulating the approximate model functions. For this, the designer must know to 
some extent which variables play a role and which form is suitable to describe the 
relation between design variables and responses. A parametric study in the initial 
design phase may be helpful for deriving this relation. Usually one starts with 
simple function models to reduce initial computation costs, e.g. first or second 
order polynomials. 

Once the model functions are chosen, an experimental design is determined, 
containing the points in the design space for which numerical experiments must be 
carried out. In principle all real design 
variable values within the design variable 
bounds are allowed to be chosen. For the 
purpose of efficiency, however, only a very 
limited number of discrete values, called 
levels, of every design variable are chosen. 

Many experimental design methods exist in 
literature, of which factorial designs are 
most often used. 

In a full factorial design on each possible 
combination of levels of design variables a 
design point is placed (Fig. 1). The choice 
of the number of levels for a certain design 
variable depends on the order of the 
variable in the assumed approximation 
model. A linear effect can be estimated by 



Fig. 1. Full factorial design for 2 
design variables with centre point. 
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means of at least two levels, a quadratic effect needs at least three levels, etc.. 
Therefore increasing the number of levels or variables, the total number of 
analyses can become excessively large and unmanageable. Full factorial designs 
are applicable when the response can be modelled by a simple approximation 
model or when a systematic search of the design region is needed. To limit the 
number of analyses, one useful approach is the addition of centre points. With only 
one extra design point three levels can be estimated. 

If the number of design variables becomes large, the desired information can 
often be obtained by using a fraction of a complete full factorial design, called a 
fractional factorial design. However, not all combinations of variables can be 
estimated in this way. For example, for four design variables a fraction can be 
obtained by constructing a complete full factorial design for three variables. Values 
of the fourth variable are generated using a combination of the previous three 
variables. As a consequence of this, the fourth variable and the combination can 
not be estimated independently and one of them should be removed from the 
function model. As a result of applying fractional designs, two or more fractions 
(i.e. lists of design points) exist which form together a complete full factorial 
design. In the first run, one can use for example the first fraction and if one is not 
satisfied with the obtained approximation models, a second fraction can be 
analyzed, and so on. 

Each experimental design method requires different knowledge about the 
problem. In any experimental design problem, a critical decision is the choice of 
sample size, that is, the number of analyses to be run. Generally, if one is 
interested in detecting local behaviour in a large part of the design space, more 
analyses are required than if one is only interested in the global behaviour. Also 
the complexity of the posed model functions and the number of design variables 
play an important role in selecting a suitable experimental design method. At least, 
with data obtained in the design points one must be able to estimate the unknown 
parameters of the approximation models, hence the number of design points should 
be equal or greater than the number of unknowns. It is advised to start with a 
simple model with a moderate number of parameters and requiring a moderate 
number of design points. In subsequent steps, gaining more knowledge about the 
behaviour of the responses to be approximated, extra design points can be added 
and more complex models can be used. It is important to note that all analyses 
remain valuable during this process. 

After determining the design points in the design space, analyses are performed 
to gather data, i.e. response values and/or sensitivity values. This offers 
possibilities for parallel computing. When all information has been collected, the 
model functions are fitted to the results of the numerical experiments. In matrix 
notation: 



y (x) = F P + £ 



( 1 ) 



with responses: 




413 



y{x) = [y,(jr),...,y^(j:)]^ 



( 2 ) 



the model functions: 



F = |/|(a:),...,//x)]^ (3) 

and unknown model parameters: 



P =[P,.-,PJ’^ (4) 

and e are the errors between calculated and approximated function values. N is the 
total number of responses, k is the number of function models and m the number 
of unknown parameters. The estimated model parameters ^ are calculated using 
a least squares approach: 



P ={F^FY'F'^y 

The number of constraints in the problem can be large if for each constraint the 
same function type is used: the system matrix is decomposed only once and the 
unknown parameters are calculated by a multiplication with the right-hand-side 
containing the response values. Using more different functions types, computation 
time for the unknown parameters increases only slightly and will be in general 
much less than one analysis with an analysis code. 

To reduce the number of analyses, sensitivity data may be used in the model 
fitting process. However, this sensitivity data is not always available or not always 
available at low costs. It even can disturb the global behaviour if the response is 
non-smooth or contains discontinuities and should not be used in this case. Since 
response quantities and their sensitivities are two entities with different (physical) 
dimensions, weighing factors must be used to express the importance of sensitivity 
data with respect to response data. 

Evaluation of the approximation models is required to check their validity. One 
possibility is to look at the absolute and relative errors between the exact analyses 
and the responses according to the approximation models. Also statistical measures 
can be computed to get an indication of the validity of the models. But remember, 
in this context experiments are computer experiments and therefore errors in the 
response-surface models cannot be supposed to be random. Validation of the 
functions should occur with design points not included in the fit. 

Statistician have developed a number of variable selection procedures to find the 
best subset among a large number of variables or combination of variables 
specified. These selection procedures give useful information about the relation 
between variables and responses and can serve as a guide in the function building 
process. One of the selection methods is called stepwise regression [2]. At each 
step, before determination of the next variable to be added, the statistics for 
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significance F of the already chosen coefficients are examined to see if a variable 
elimination is applicable. If the variance contribution of a variable in the regression 
is insignificant at a specified F-level, this variable is removed from the regression. 
If no variable is to be removed, the procedure looks whether the variance reduction 
obtained by adding a variable to the regression is significant at a specified F-level. 
If so, this variable is entered into the regression. An important property of the 
stepwise regression procedure is that a variable which entered the regression, may 
be removed in a later stage when it becomes insignificant, therefore only 
significant variables are included in the final regression. The procedure stops when, 
for specified significance levels neither a selection nor a elimination of a variable 
is indicated. The final results of the procedure are not unique, as they are 
dependent on the choices of significance levels for addition and deletion. 

If further model improvement is necessary, one performs another model building 
cycle consisting of design data collection, model fitting and testing. This is clearly 
not an automated process since the decisions depend on the users knowledge and 
his desire to control the process. If the resulting models describe the response 
behaviour accurate enough, they can be used as explicit problem functions in the 
optimization. 

The response-surface method is suitable in an early design stage where the 
number of design variables is relatively small (i.e. 10 to 15) and for problems with 
non-smooth response behaviour. However, global approximation concepts are not 
only valuable for the preliminary design investigation. After an optimum has been 
found, global approximation models build around the optimum can be used to 
investigate changes in the optimization problem specifications like changes in 
design variables or constraint bounds without the need to run the analysis code 
once again. Another important feature is that noise and other irregularities can be 
averaged out through the smoothing capabilities of functions, avoiding multiple 
local minima and preventing premature convergence of the optimization algorithm. 



2.2 Mid-range function approximation concept 

Mid-range approximations [3] are designed to be valid in a smaller region than the 
region for global approximations. Such a region is bounded by movelimits. As a 
consequence, simpler model functions and fewer design points can be used in 
comparison with the global method. Also, contrary to the global method the 
mid-range approximations are constructed in a sequential way comparable to local 
methods. The differences with local methods however are that the initial 
movelimits are larger and previous analyses are used as much as possible to 
construct and enhance the approximations. 

The process of building mid-range approximation functions uses basically the 
same scheme as building globally valid function models. First, model functions 
must be selected for each response which can be, for example, simple linear or 
reciprocal. A restricted number of design points is available for function 
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construction, in principal only the design points within the movelimits. Hence the 
number of parameters must be kept small, preferably in the order of the number 
of design variables. In general, model functions should be able to describe 
curvature well in order to speed up convergence. 

Two strategies can be distinguished for the choice of the design points the 
models are fitted to (Fig. 2). The first strategy uses data from single points along 




Fig. 2. Design point placement in mid-range SPP (left) and MPP (right) method. 



the iteration path in the design space, so called single-point-path (SPP) methods. 
The design points are the solutions of the sequential optimization subproblems. All 
points within the movelimits are used to build the approximation. As the 
optimization progresses, more design points become available to fit the models to, 
and the approximations are improved in each optimization cycle. The number of 
design variables can be as large as for a local method. 

Approximations derived from data computed in clusters of design points along 
the optimization path are called multi-point-path (MPP) methods. Around each 
solution of the optimization subproblem one or more extra points are generated by 
placing points according to simple experimental designs as discussed in section 2.1. 
This approach is valuable if no sensitivity data can be used, for instance if 
sensitivity data is not accurate enough or if it can not be computed, and is 
preferable above the calculation of finite differences. However, a restriction is 
placed on the number of design variables to be used, since for each additional 
design variable extra design points must be analyzed. The number of design 
variables must be limited to keep the problem manageable. 

Both mid-range methods require a robust movelimit strategy: the design region 
should be large enough to use multiple points but should be not too large to 
worsen the accuracy of the models. The choice of the size of the design sub-region 
is based on a mixture of accuracy of the approximation functions, constraint 
violation and convergence behaviour of the objective function. 

After the unknown parameters in the function models are estimated, the 
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approximate optimization problem is solved by an optimization algorithm. A large 
number of constraints can be handled and their number is only limited by computer 
resources. Next, around this optimum a new approximation is made based on either 
a SPP or MPP strategy and the process is repeated until convergence occurs. 



3 Example: Acoustic optimization 

3.1 Introduction 

The main objective of structural-acoustic design is to create quiet structures. 
Generally, such structures were designed on a trial and error basis and any design 
better than the original design was accepted as a satisfying, not necessarily the 
optimum, design. During the past decade attention has shifted to the use of 
optimization methods, leading to a more structured design approach. Serious 
difficulties that remain are the derivation of a correct analysis model to describe 
the physical (acoustic) behaviour of the structure and the inherent long computation 
time for acoustic analyses. 

As in most optimization problems, the success of the optimization and the 
acceptance of the final design depends on the formulation of the objective and 
constraint functions. In [4] various objective function formulations for structural- 
acoustic plate design were considered. These formulations are valid for acoustic 
optimization problems in general. Sound power radiated from vibrating plates was 
optimized in [5]. The optimization problem was solved using a gradient based 
optimization routine. Since the problem had multiple minima, only local minima 
could be found and the possibility of using global optimization methods was 
suggested. In reference [6] an engine structure is optimized for minimum noise 
transmission. Response surface methods were used to construct explicit relations 
between design variables and responses. Closed form expressions between the 
transmission loss of sandwich panels and the design variables were derived in [7]. 
The authors stated that methods which spread the search of the optimum over the 
entire design space have more chance of finding the global optimum than methods 
that rely on perturbations near the current design. 

The acoustic optimization problem that will be considered here is described in 
[8]. It comprises most of the aspects encountered in acoustic optimization 
problems: no sensitivity data readily available and a noisy objective function 
behaviour. On the other hand, the mathematical model of the air intake system 
describes the physical model sufficiently accurate and computation time for the 
acoustic analysis is relatively short. 
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3.2 Acoustic model of engine air intake system. 

In Fig. 3 a four-cylinder engine air intake system is schematically represented by 
straight tubes. It consists of four air intake runners, one for each of the four 
cylinders of the engine. The air cleaner is connected to the intake runners by a 
crossover tube and a throttle body. To the air cleaner an inlet snorkel is mounted. 



intake runner 



pressure input 



crossover tube 



pressure output 



throttle body 



inlet snorkel 



air cleaner 



N closed end 



Fig. 3. Model of four-cylinder engine air intake system [8]. 

The noise produced by the system results from the opening and closing of the 
inlet valves of the engine cylinders. If one of the inlet valves opens, the pressure 
in the cylinder is above atmospheric pressure and a positive pulse sets the air in 
the inlet system into oscillation. The output pressure at the inlet snorkel is a 
function of the frequency at which the valves open and close. This frequency is 
directly related to the number of revolutions at which the engine is running. At 
some specific number of revolutions the natural frequencies of the inlet system are 
matched and the manifold transmits noise very efficiently. 

Plane wave theory (see [9]) is used to derive the relation between the input 
pressure at the intake runner and output pressure at the inlet snorkel. Part of the 
equations involved are derived in the appendix. A fluctuating pressure of unit one 
is assumed at input while the other three valves are closed. The resulting set of 
complex equations is solved by the NAG-routine F04ADF. 

Experiments on the real air intake system showed a good resemblance with the 
analytical model [8]. The small differences can be explained by the non-smooth 
junctions and the neglect of mean flow in the tubes. Also, acoustic damping and 
wall absorption is not taken into account. A comparable system was optimized in 
[10] using the finite element method. By adding volumes to the initial system, the 
pressure amplitudes decreased but more eigenfrequencies were introduced. 
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3.3 Optimization problem formulation 

Table 3.1 shows the initial lengths and diameters of the tubes. Only the lengths of 
four of the tubes are taken as design variables in this study and vary between the 
bounds given. 





initial length (m) 


bounds (m) 


diameter (m) 


intake runners 


4 = 0.216 


0.15 - 0.51 


S, = 0.034 


crossover tube 


Lb = 0.152 






throttle body 


L, = 0.063 


0.05 - 0.50 


S, = 0.044 


air cleaner 


= 0.152 


0.10 - 0.30 


S„ = 0.251 


inlet snorkel 


L, = 1.016 


0.51 - 1.50 


S, = 0.076 



Table 3.1. Initial design of air intake system. 



The most undesirable frequencies lay in the range from 50 to 250 Hz. To obtain 
a low noise transmission in this range, the average output pressure between 50 and 
250 Hz is taken as the object function U: 



V = -Ep,(/) 



( 6 ) 



where n is the number of frequency intervals used in the summation. In [8] a 
frequency step of 5 Hz was used to obtain reasonable computation times but 
caused the objective function to be highly non-smooth. A smaller step size 
decreases the non-smoothness of the function. Replacing the summation by an 
integration (e.g. Runge-Kutta integration with variable step size control) even 
removes the non-smoothness completely. To maintain the noisy behaviour of the 
objective function and to represent the actual response as close as possible, a step 
size of 1 Hz is chosen. The objective function value for the initial design is 0.040. 
Sensitivity data is not directly available and, as will be seen in section 3.4, it is not 
wise to include it in the optimization process. No further constraints are imposed 
on the problem. 



3.4 Optimization results 

To gain some insight in the objective function behaviour, the optimization problem 
is first solved with the global function approximation method. A full factorial 
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design on two levels results in a total of 16 design points, positioned at the bounds 
of the design space. On the results of these 16 design points a polynomial with 
only first order main terms is fitted. Maximum difference between exact and 
approximated values is 66%. Since the approximation forms a plane in the four 
dimensional design space, the optimum is found in a corner of the design space, 
X = (0.51, 0.05, 0.30, 0.51) where the approximated objective function value is 
0.0074 and the exact value is 0.0058. Of course, one should at least add quadratic 
terms to the function model and perform some additional analyses to obtain a 
better explicit approximation and a more accurate optimum. 

To be able to visualize the design space, the optimization process is limited to two 
design variables. Construction of a normal probability plot [6] from the 16 design 
points and a full polynomial with all possible interactions between variables reveals 
that design variables and are of main importance. In Fig. 4 the objective 
function is shown, based on a grid of 20 by 20 design points. 




0.5 



Fig. 4. Design space for design variables and L^.. 



The design space is highly non-smooth with many local minima. A direct coupling 
of the analysis code with an optimization algorithm would almost certainly lead to 
premature convergence of the algorithm. The use of sensitivity data, if it was 
available, could even worsen the process of approximation and optimization. 

Searching for the global optimum design in the above grid yields the design 
= 0.49, = 0.22 and the objective function value U = 0.0092. This point will 

serve as a reference for further optimization. 

Next, the mid-range function approximation method is applied to the two- 
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dimensional problem. Linear models are constructed in each optimization cycle. 
The initial move-limit size is 
chosen to be 0.4 for each 



design variable direction, i.e. 
around the initial design an 
area of 0.4 times the distance 
between the bounds on the 
variable is chosen, and the 
move-limits are allowed to 
change in each cycle and for 
each design variable 
separately. Choosing a smaller 
initial step size causes the 
optimizer to get trapped in a 
local optimum. Additional 
design points are chosen along 
each variable axis. 




Convergence is achieved after 
9 optimization steps and 25 
objective function evaluations 
(Fig. 5). The path followed by 
the optimizer in the design 
space is shown in Fig. 6. 

The optimum design found 
is = 0.51, = 0.22 where 

the approximate objective 
function value U = 0.0093 
and is close to the optimum 
found from searching the 20 
by 20 grid. Starting the 
optimization from different 
starting points resulted in 
some cases in local minima 



Fig. 5. Iteration history mid-range approximation 
method. 




variable La 



but a restart requires only a 

moderate number of analyses. Optimization path through design space (’o’ = 

Use of a larger initial move- 
limit does not automatically 

lead to a faster convergence with lesser analyses but, what is more important, the 
optimum lays in these cases close to the global optimum. A more detailed study 
with all four design variables was carried out and showed the optimum to be x = 
(0.51,0.05,0.30,0.51), i.e. in a corner of the design space, and U = 0.0058. Design 
variables L^j and appear to have no a great effect on the final objective function 
value. A large reduction in noise transmission can be achieved compared to the 
initial design of the air intake system. 
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4 Conclusion 

Function approximation concepts based on response surface methods are discussed. 
Response data from multiple design points are used in constructing the explicit 
approximations which serve as an interface between analysis code and optimization 
algorithm. The multi-point approximations appear to be useful if no sensitivity data 
is available or cannot be used due to non-smooth response behaviour. 

Application to a four-cylinder engine air intake system for minimum noise 
transmission is shown. Non-smooth response behaviour is caused by numerical 
noise which is effectively dealt with in the approximation process. 
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Appendix 

For the derivation of the system of equations, plane wave theory is used. The basic 
assumption is that the diameters of the tubes are small compared to the acoustic 
wavelength. Then the acoustic variables are functions of only one coordinate and 
the wave equation is: 



(A.1) 

dx^ 

where p is the pressure and c is the speed of sound. A good resemblance between 
measurements and numerical modelling using plane wave theory was shown in [8]. 
A part of the complete air intake system is shown in figure 1. 



k- 



La 



A3 

B3 



A1 ^ 




Fig. 7. Part of plane wave model of four-cylinder induction system [8]. 



The general solution of the wave equation is: 

p ^ gHo^t-kx) ^ ^^i((0t^kx) (A.2) 

where A is the amplitude of the forward travelling pressure wave, B is the 
amplitude of the backward travelling wave, co is the frequency (rad/s), k is the 
wavenumber defined by k = co/c - ia (a is the absorption coefficient). Assuming 
at each junction continuity of pressure and mass conservation, a set of 18 complex 
equations can be derived describing the relation between input and output pressure 
as a function of frequency CO, where Aj and (j = 1, ..., 9, the number of 
junctions) are the 18 unknowns to be solved. At position 1, Xj = 0, and a 
fluctuating input pressure of magnitude 1, /?i = resulting in: 
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A, + B. = 1 



(A.3) 



Applying continuity of pressure at junction 2: 

p,( X, = L ) = p,( 0) ^ A, e + B, e = A, + B, (A-4) 



= p^{x^=0) => Aj (A-5) 



And mass conservation at junction 2: 

=>5^«2(^2=0) 



with the velocity: 



yields: 



u = _JL(A.c - B.e'*®'**^*) 

' PqC ^ 



S (A, + A3)e-'*^- - 5/B. - = S,(A^ - B,) 



At closed end 3 (valve closed) the volume velocity is zero, resulting in: 



Aj - Bj = 0 



(A.6) 



(A.7) 



(A.8) 



(A.9) 



Other junctions are equally treated. At the open end of the inlet snorkel (junction 
10) a radiation impedance Z exists since the tube radiates sounds into the 
surrounding medium. Assuming a flanged pipe; 



7 ^ (.kaf ^ (kay ^ . %ka _ 32{kay 

^ * * IF 457T 



where a is the radius of the snorkel end. Since Z = p/u: 
A,(l-Z)e‘'“‘ + B,(l-Z)e'*^' 

and the exit pressure becomes: 

A -ikL, . r> ikl, 

+ B^e ' 

^ exit 9 9 



(A. 10) 



(A.11) 



(A. 12) 
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Abstract. NVH (Noise, Vibration, Harshness) is one of the main attributes in a 
passenger car. One of the key systems responsible for the overall NVH behaviour 
is the powertrain mounting system. The optimal layout of this system leads to the 
definition of a stochastic optimization problem. In this paper the background of 
the complexity of the powertrain mounting system is highlighted. In a first 
approach the optimization problem is solved with Design of Experiments (DOE) 
methods. 

Keywords. DOE, Powertrain Mounting System, NVH, Scattering Mechanical 
Quantities, Vehicle Attributes 



1. Introduction 

The development of a passenger car is a multidisziplinary task. The vehicle has to 
fulfill demands out of different attributes like vehicle dynamics, driveability, 
acoustics, thermal and heat management, safety, crash, economics. This paper 
concentrates to acoustics or in general Noise, Vibration and Harshness (NVH) of 
the vehicle. This area is currently one of the main attributes defining overall 
performance and customer perception of a passenger car. Very often demands out 
of this area are conflicting with other areas. A main problem is the variability of 
mechanical quantities describing the NVH performance of a car. To overcome 
this, the extension of the conventional deterministic optimization problem to a 
stochastic optimization problem is necessary. The powertrain mounting system, 
which is the physical link between powertrain and vehicle, is one of the most 
critical systems with respect to NVH. So this paper concentrates on aspects for the 
optimal layout of the powertrain mounting system. Because of the complexity of 
this single task in a first approach a sensitivity calculation of the system is 
performed with Design of Experiments (DOE) methods. 

NVH Vehicle system concepts (e.g. body structure, front- and rear suspension, 
powertrain mounting systems, etc.), which are selected in an early program phase, 
have significant influence on NVH performance of the vehicle. It is almost 
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impossible to solve NVH concerns resulting from selection of poor concepts in a 
later program phase. A good understanding of dynamics of powertrain and its 
interaction within the vehicle is needed to design and realize mounting systems 
that allow the company to reach its NVH leadership goals. The selection of the 
powertrain mounting concept and the design of powertrain mounting system is a 
highly complex task which requires involvement of several areas. 

In the following, demands for powertrain mounting systems, theoretical 
background, an overview on the design process as well as some practical details 
of the development process will be given. 



2. Demands for Powertrain Mounting Systems 

The powertrain mounts are the main links between powertrain and body structure. 
The main functional demands for the powertrain mounting system are: 

• Support of the powertrain under all load cases, e.g. 

- gravity weight 

- drive torque 

- tip in / back out* (torque change) 

- switch on / switch off 

- acceleration, cornering 

- road impacts 

• Isolation of vibration due to 

- Engine excitation [1], e.g. idle, acceleration, cruise, overrun 

- Driving maneuvers, e.g. torque change, switch on / switch off 

- Road and wheel excitation 

The selection of the powertrain mounting system is heavily restricted by both 
program assumptions as well as corporate demands : 

• Powertrain concept or concepts 

“ Front-wheel-drive, rear-wheel-drive and/or all-wheel-drive 

- North-South (N-S) or East- West (E-Wf powertrain installation 



Short form for pressing or releasing the gas pedal 

^ Indicates orientation of the powertrain in the vehicle. N-S is along the vehicle x- 
axis which is in driving direction. 
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- Engine architecture, e.g. 14, V6, 13, 15, etc. 

- Gas- or Diesel Engine 

- Manual or automatic transmission 

• Available package space^ (rock angle) 

• Crash requirements 

• Cost requirements / Investments 

• Feasibility and manufacturing demands 

• Durability 

• Company cross-carline strategy 

- reduced complexity 

- unique parts 

All demands for powertrain mounting systems as given above have to be taken 
into account for selection of the mounting concepts. It is obvious that some of the 
demands are contradictory. 



3. Effects and Vehicle Systems Influenced by the Powertrain Mounting 
System 

A series of phenomena within the vehicle are mainly influenced by the powertrain 
mounting system. The main NVH effects are given in Tab. 1 

Powertrain mounts have effects over a wide frequency range, excitation 
amplitudes vary from several millimeters in the low frequency range to 
micrometer in the high frequency range. As already stated above, the selection of 
the engine mounting system has severe influences on several other vehicle 
systems. 



4. Systems Engineering Approach 

Customer wants and customer satisfaction are the key points during development 
process of a new vehicle [2]. Extensive market research activities including 
benchmarking, customer drives and translation from customer wording to 
objective measurables (quality function deployment QFD) result in target values 
for vehicle attribute performance. With respect to powertrain NVH, these total 
vehicle targets are e.g. interior noise level at specified driving conditions or idle 
vibration of the seat track. All vehicle development work, e.g. vehicle system 
selection, vehicle system optimization and component design is based on these 
target values, which become program objectives during the development process 
after confirmation of vehicle system selection. 



^ Package indicates the amount of available volume to arrange engine, 
transmission and all other subsystems. 
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NVH Effect 


Frequency Range 


Drive noise and vibration 


20 - 500 Hz 


Idle shake, vibration and boom 


5 - 50 Hz 


Road induced shake 


10 - 15 Hz 


Take-off judder 


10 - 30 Hz 


Drive-away harshness 


20 - 100 Hz 


Switch on/off vibration 


5 - 20 Hz 


Tip in / back out 


3- 10 Hz 


Steering column shake 


25 - 40 Hz 


Powertrain boom and harshness 


50 - 500 Hz 



Table 1. NVH Effects Influenced by Powertrain Mounting System. 
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Figure 1. Target Setting for System and Components - Systems Engineering 
(P/T = Powertrain). 



Target cascading, which is mainly supported and performed by CAE techniques, 
is the process of deriving system and subsystem targets based on total vehicle 
targets. One aspect of target cascading for drive noise involving powertrain 
mounting system characteristics can be seen from Fig. 1. This graph shows some 
aspects of source identification, source ranking and transfer path analysis. The 
final outcome of these exercises are targets for excitation levels and transfer 
characteristics as well as targets and design specifications for components. 



It is obvious, that target cascading is performed for all vehicle attributes. If system 
and component targets derived for different aspects are contradictory, trade-offs 
have to be performed already in an early program state to ensure high overall 
customer satisfaction. Tab. 2 as well as Fig. 1 should point out that specifications 
for each vehicle system can be derived only taking into account total vehicle 
performance. One main objective during NVH development work is to avoid 
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coincident resonances within the vehicle. It is well known, that best isolation of 
vibration can be achieved if excitation frequencies are well above resonance 
frequencies. With respect to powertrain mounting system characteristics, rigid 
body modes of the powertrain should be as low as possible, but above 1/2 engine 
order frequency and should be well separated from overall vehicle system 
resonances. Fig. 2 shows characteristic resonance and excitation frequencies of the 
main vehicle systems. There is a small frequency window available only for rigid 
powertrain modes. 



It is well known that it is not sufficient for competitive vehicle NVH performance 
to design a powertrain mounting system which has just all rigid powertrain modes 
within the frequency gap marked in Fig. 2. Experience showed that mode shape 
characteristics and frequencies of modes play an important role for the quality of a 
powertrain mounting system. 



5. Powertrain Mounting System Design Process 

Several departments have to be incorporated during the powertrain mounting 
system selection and design process. This process is heavily driven by CAE 
analyses [3], as there is no hardware available in this early development stage. 
The powertrain mounting system design process is illustrated in Fig. 3. 

• The different powertrains are defined in program assumptions, full vehicle 
performance requirements, e. g. NVH targets, are derived from the bench- 
marking and target setting process. 



Body Movements 
Ri^id* Body Modes 

Drive Trafn 
Shufffe 

Powertrain Mount System 
Rigid Body Modes 
Body 

Bending / Torsion 

Chassis / Steering 
Wheel Hc^ 

Steering Colymn Bending 

Engine Excitation 
Idle 'A Order 
Idle 2nd Order 
Drive 2r>d Order 

Tyne- / Wheel-Excitation 
1st Order 

Inline ^'Cylinder Engine 




Figure 2. Typical Excitation- and Resonance Frequencies of a Vehicle. 
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• Based on body, powertrain and chassis design, available package as well as 
experience from former vehicle programs (bookshelf knowledge), an initial 
proposal for the powertrain mount positions as well as powertrain mount 
stiffnesses for idle load is made. These proposals have to be confirmed by 
CAE analyses. As in this early stage of the development program, no detailed 
body information is available, first CAE calculations are performed on a 
linear, ‘grounded’ powertrain CAE model. The first set of design variables is 
optimized to fulfill modal demands. 



Powertrain Mount Design Process 




Figure 3. Simplified CAE based optimization of Engine Mounting Systems. 
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• Linear mount characteristics are not sufficient because of package constraints 
(e.g. clearance for maximum rock angle). Therefore, progressive mounts 
characteristics have to be designed. Non-linear, progressive mount stiffnesses 
with “zero-load” stiffness as confirmed above are selected, which should 
ensure the following criteria: 

- smooth transition from linear to progressive characteristics range, 

- drive in non-linear characteristics range in low gears only, 

- mount stiffiiess for drive as low as possible, 

- use of maximum feasible deflection at maximum engine load only, 

- feasibility. 

CAE checks of maximum rock angle and powertrain mount forces of non- 
linear mounts can be performed with non-linear models only, which are 
mainly grounded powertrain models. 

• If all targets and requirements are met, a ‘total vehicle system CAE mode’ has 
to be used to check, if full vehicle NVH targets for all phenomena influenced 
by powertrain mounts are met as well as to optimize mount characteristics as 
well as body and chassis performance. 

If no detailed body information is available at the first total vehicle 
calculations, CAE runs can be performed using surrogate body models or 
simplified body CAE models. The following CAE tools are widely used to 
support powertrain mount design process: 

ABAQUS : 

Standard Finite Element Program for linear and nonlinear calculation. 
ADAMS: 

Standard tool for Vehicle Dynamic applications. For NVH the limited 
upper frequency limit constraints the use rigidly. 

NASTRAN: 

Standard Finite Element Program for all kinds of calculations. 

MOTRAN: 

Set of programs developed by FORD, which are used for NVH analysis 
of vehicles. It allows combination of modal models with simple elements, 
e.g. linear springs. Massive reduction of computing time. 

HYBRID TECHNIQUES: 

Techniques, which allow combination of calculated (CAE) and measured 
(test) data. 

It is urgently necessary to use total vehicle CAE models from this step on as rigid 
body modes of grounded powertrains can significantly differ from modes of the 
same powertrain with identical mounts in a vehicle. Therefore, a post- 
optimization of the above developed mount characteristics is necessary. Idle as 
well as drive vibrations can be calculated with help of standard dynamic FE 
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packages. One sample result of such a calculation of steering wheel idle vibrations 
is given in Fig. 4. It shows vibrations for the same load for two different sets of 
engine mounts. 

Estimates for total vehicle NVH performance above 80 Hz are not possible with 
standard FE tools, at least not at this program state. The only way to verify system 
performance in a higher frequency range is to use hybrid techniques for so-called 
‘Noise Path Analysis’ tools. 

• If all NVH targets are met, a sensitivity analysis follows, which identifies main 
influence factors and enables definition of feasible tolerances or requires a 
complete redesign to achieve a more robust powertrain mounting system. 
After a feasibility check, the final proposal will be realized as hardware and 
will be tested in prototypes or demonstration vehicles. 




Figure 4. Idle Vibration of the Steering Wheel. The two curves define calculated 
accelerations in z-direction of the steering wheel. 



• The described development process leads to a fast possibility to find an 
optimal layout of the engine mounting system from NVH point of view. 
During the design process, it is possible and often necessary to jump back to 
an arbitrary earlier step. In each step and as soon as available, consideration of 
requirements from other areas like Vehicle Dynamics, Driveability, 
component areas and the supplier is necessary. This rises the need for parallel 
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optimization for several attributes. It is impossible to solve this multi-criteria 
optimization problem purely by CAE. So a well defined process must be 
established with participation of all related areas including the manufacturer 
directly from the beginning. By this way only, necessary trade-off decisions 
are possible in an early stage. 

6. Torque Roll Axis System 

Since introduction of E- W installed powertrains in front-wheel driven vehicles, a 
series of mounting systems has been developed. Design of the first systems has 
been mainly dominated by package reasons whereas in the last years a general 
trend goes to so-called ‘Torque Roll’ axis (TRA) systems (Fig 5). These are 
powertrain mounting systems mainly based on inertia characteristics of the 
powertrain. 




Figure 5. Torque Roll Axis and Principal Axis. 



433 



The ‘Torque Roll axis’ (TRA) is the theoretical axis on which a free powertrain 
rotates if subjected to torque fluctuations about the crankshaft. The orientation of 
this torque roll axis is defined only by inertia properties of the powertrain, the 
center of gravity is located on this axis. The roll axis, which is the axis of rotation 
of the mounted powertrain, if it is subjected to torque fluctuations, is not identical 
with the torque roll axis. The roll axis is dependent on powertrain inertia, mount 
stiffness and excitation frequency, whereas the (static) rock axis depends only on 
the mount stiffness. The idea of so-called TRA mounting systems is to minimize 
the difference between roll axis and TRA axis in order to achieve minimum 
dynamic mount forces in idle, which are mainly caused by combustion pressure 
torque fluctuations. 



7. Scattering Variables 

It is necessary for customer satisfaction to design and manufacture a robust 
vehicle. A great scattering of different parameters describing the overall NVH 
performance of the vehicle is definitely not tolerable. Identified main parameters 
responsible for the scattering of the interior noise and vibrations at certain contact 
points in the vehicle are: 

• Engine mount idle stiffriesses. Here a certain tolerance level is realistic for 
production parts. The exact value is depending from the kind of mount (rubber 
or hydraulic) and the manufacturing process. 

• Engine mount high frequency dynamic stiffnesses. In this paper not included. 

• Maximum preload values due to weight of the powertrain and/or engine 
torque. A deviation of ± (5... 10) % seems to be a realistic lower tolerance 
bound. This scattering is critical because of large deviations of the dynamic 
engine mount stiffnesses with scattering preloads. 

• Load variation in idle with electrical consumers on/off. 

• Mounting positions. 

Fig. 4 shows calculated accelerations at the steering wheel in z-direction for two 
different engine mount idle stiffhess sets. The original set with nominal stiffnesses 
shows accelerations at a lower level than the second set. Stiffhess data of the 
second set are varied within the production tolerances. The necessary calculations 
are performed for a total vehicle CAE model which includes a detailed modal 
model of the flexible body. 
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8. Formulation of an Optimization Problem 

For the optimization of the engine mounting system an intuitive approach is 
unsatisfying. So the formulation of an optimization problem is necessary and the 
use of an optimization procedure for the solution of the problem. 

8.1 Definition of a Stochastic Optimization Problem [5] 



A simplified definition of the continuous, stochastic optimization problem is given 
in the following. 



with 



"Min” f^(Z) 
xeD 



( 8 . 1 ) 



f^,(Z) = Aj E[f.(Z)) + = (X^, P^) 

D = {xeR"\h,= OV/ = = P[^*(Z) < O] < 

\/k = \,...,rig,Xu <x^<x^ VA: = 

the vector of inequality constraints g ^ 5 5^2 ’ • • • ? ) 

and 

augmented objective function, here interior noise or vibrations, 

inequality constraints as a function of stochastic variables, here frequency 
conditions, 

hj equality constraints A, = /z,(p ) , here the system equations, 

Z vector of the stochastic variables, 

X vector of the stochastic design variables, here mount stiffnesses and/or 
coordinates, 

P vector of the stochastic parameters. 



( 8 . 2 ) 

(8.3) 

(8.4) 



E(f(Z)) expected value of the objective function. 



F(/(Z)) variance of the objective function, 

, ^2 weighting factors, 

D feasible design space, 

failure probability of the k-th inequality constraint, 

Pk^ feasible value of the failure probability, 
number of stochastic inequality constraints, 

^ki’^ku lower and upper bounds of the design variables, respectively. 
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Here X = P is assumed. 



8.2 Stochastic Optimization Procedures 

The optimization problem defined in equation (8.1) can be solved by means of 
different procedures. By integrating a stochastic optimization procedure into the 
optimization procedure SAPOP [4], a complete and extensive optimization 
environment is available that additionally allows to use further optimization 
strategies in combination with stochastic optimization [5]. Thus, the stochastic 
optimization problem in (8.1) is transformed into a quasi-deterministic 
optimization problem with reliability constraints. Here, fulfillment of constraints 
is determined by stating probabilities, which are considered during optimization. 
On the other hand, the deviation of the objective function owing to the stochastic 
distribution of the design variables and constraints is neglected. 

8.3 Practical Solution of the Stochastic Optimization Problem 

For the solution of the mentioned optimization problem it it necessary to have a 
structural model of the total vehicle. This includes 

• detailed Finite Element (FE) body model, 

• concept FE chassis model, 

• detailed FE powertrain model for interior noise or concept FE 
model for vibration calculations, 

• cavity FE or Boundary Element model (only for interior noise 
calculations). 

The CPU time for one structural analysis is of the magnitude of several hours on a 
supercomputer. So it is obvious that a stochastic optimization is only possible 

• with the help of simplified models where sufficient, 

• with the use of model reduction techniques like superelements or 
modal models, 

• the limitation to a number of stochastic variables < 10. 

9. Robust Design with DOE Methods 



With CAE methods it is possible in a very early stage of the program to 
investigate the robustness of the total vehicle and different subsystems, e.g the 
powertrain mounting system. One of the methods is the Design of Experiments 
(DOE) [6, 7] methodology. In combination with Response Surface Methodologies 
(RSM), which can be used with CAE models and tests very effectively, also an 
optimization is possible. Two different approaches are: 

• Design of a robust powertrain mounting system. This is limited to the concept 
phase of the program. 
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• Investigate the robustness of a given system, identification of sensitive 
parameters and formulation of feasible tolerances for the mechanical quantities 
of the powertrain mounting system like stiffhesses and the manufacturing 
process. 

The first approach is for sure the best but often not possible. The second one is 
applicable in a running development with a given concept but can only improve a 
given system. DOE allows to quantify the influence of varying dynamic stiffness 
of each powertrain mount for each direction on both powertrain forces as well as 
vehicle NVH performance. Changing stiffiiess of one mount in one direction only 
can influence forces of other mounts drastically. 

Fig. 6 shows adjusted response curves as a result of a typical DOE study 
performed with the software package RS [7]. The structural model contains a full 
vehicle detailed model but with a modal representation of the body. This reduces 
the structural analysis time by a factor of about 20. In a first step no interactions 
between the variables is assumed. The meaning of the variables is as follows for a 
system with three engine mounts: stiffhesses and locations of the engine 

mounts, for each direction of the mount one indenpendent stiffiiess is assumed, 
constraint position variables are not included. 




Figure 6. Adjusted Response Curves for the DOE Study. 
The adjusted response surface (NVH index) 
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NVHindex{Z) = f{X,,X2,...,X^ ) (9.1) 

is a normalized scalar measure for the vibrations at certain points in the vehicle. A 
greater value indicates smaller amplitudes of the vibrations. For the vehicle 
development the target is to maximize this value. From Fig. 6 it is obvious that the 
influence of the three adjusted variables ^ 2 , x^ is dominating the other 
variables. In a further step a new DOE is necessary including interactions but 
limited to the main identified variables. With this result the RSM is used to find an 
optimal design. 



10. Conclusion 

The powertrain mounting system is one of the key systems for the overall NVH 
behavior of the vehicle. This paper presents in a short form the mechanical 
background of this system and explains the stochastic character of the 
optimization problem. Because of the complex structural model only a simplified 
approach for the solution is applicable. With the proposed procedure the 
identification of the main sensitive parameters is possible. In the later 
development the optimization of these parameters is necessary. Currently the 
investigation of all parameters is state of the art. With the given methods in future 
a drastical reduction of the effort is possible. 
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